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INTRODUCTION AND OUTLINE 
OE REPORT 

This report summarizes the work that has been done on the NASA 
Research Grant NsG-490 since the last reporting date of July, 1969. The 
report is divided into four separate parts titled: 

• Part I: Stochastic Approximation and its Engineering Applications 

Part II: Stochastic A.lgorithms for Self-Adaptive Filtering and 

Prediction 

Part III: The Control of Nonlinear Stochastic Control Systems Under 

Discounted Performance, Criteria 

Part IV: Linear Stability of a Nuclear Rocket Engine With Two 

Reactivity Feedbacks 

Each of the sepai'ate parts includes its own index, bibliography, and 
pagination, and the content: of each is discussed briefly in this intro- 
duction. The report has been labeled an interim report because work in 
three of the four areas is not complete, as discussed below. 

Of the four parts. Part I cn "Stochastic Approximation and Its 
Engineering Applications," is the only portion of this report that may be 
considered complete. This is a tutorial treatment of the subject of 
stochastic approximation that emphasizes the algorithmic approach to 
optimization in the presence of uncertainty or noise. The uncertainty 
or noise may arise from basic ignorance of the underlying phenomena, 
experimental errors, or inherent random fluctuations. The nomenclature 
Stochastic Approximation is used to emphasize the stochastic nature of 
the errors in, say, the process measurements, and the use of these 
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measurements (past and present) to calculate the approximate location 
of the optimum or goal. Of particular importance is the fact that 
the use of the stochastic approximation algorithms assumes no a priori 
knowledge of the noise statistics that are involved in the optimization 
problem at hand. This is an important practical consideration. 

The stochastic approximation theory that is described in Part I 
is the basis for Part II, "Stochastic Algorithms for Self Adaptive 
Filtering and Prediction." The basic goal of this research is to 
develop a self-adaptive solution to the problem of optimal filtering, 
prediction, and detection of stochastic signals imbedded in random 
noise. In particular, the random noise is considered to be unknown. 

This is In contrast to theories of Weiner and Kalman which require a 
complete knowledge of the covariance matrices of both the plant and 
observation noises. Rarely are such complete descriptions available, 
and, in fact, the requirement that the noise covariance matrix be non- 
singular has often resulted in unwarranted assumptions as to the nature 
of the components of the noise involved. In this -report, an unsuper- 
vised learning criterion is formulated from which self-adaptive 
algorithms are derived. These algorithms learn the optimum discrete 
time stationary Kalman filter directly. This eliminates both the' 
necessity of estimating the plant and noise covariance matrices as an 
intermediate step and the need to solve the entire set of filtering 
equations. The problem associated with the need for the nonsingular 
measurement noise covariance matrix is thus elminated or rather by- 
passed by using this alternate approach. It is shown that the stochastic 
algorithms developed for estimating the optimum filter converge in a 
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mean square sense with probability one. The results are valid for 
scalar and vector values for signal and noise processes. It is 
expected that the research described in Part II of this report will 
be completed by July 1, 1970, and a more complete and final report 
will be issued at that time. 

Part III on "The Control of Nonlinear Stochastic Control Systems 
under Discounted Performance Criteria," is similar to Part II in that 
it presents the theoretical basis for a Ph.D. dissertation. As in 
Part II, the system dynamics are modeled with difference equations, 
and the goal is to obtain, a practical algorithmic approach. Here, 
hoxjever, the problem is one of determining the optimal control, rather 
than one of obtaining optimum estimates, as above. The approach is 
through the use of dynamic programming in a partitioned state space, 
where the advantage to be gained over a conventional dynamic program- 
ming approach is largely a computational one. The discounting factor 
in the performance criteria, g, is required to insure convergence. 

The format of this presentation is largely one of theorem, lemma, 
and proof, with only two^ relatively simple examples. There seems to 
be real hope, however, that the optimal control methods developed in 
Part III may well be applicable to the control of the restartable nuclear 
rocket engine, where the discounting factor g may be related to failure 
probability, and the noise or variation in the system may be considered 
as due to changing parameters within the system, in particular the 
degeneration of the core. More will be said on this and other applica- 
tions in the annual report due in July. 
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The last portion of this report. Part XV, is concerned with 
"Linear Stability of a Nuclear Rocket Engine with Two Reactivity Feed- 
backs." As the title implies, this research applies directly to the 
nuclear rocket work. The object of this investigation is to define 
stabilicy boundaries for the nonlinear reactor in a number of 
parameter space. Much of the work was done .by simulation on an 
analogue computer for the linear system in preparation for attacking the 
nonlinear model. An attempt to simulate the nonlinear equations on the 
analogue computer proved unreliable, due to complexities involved. 

Work will continue in this area, with simulation to be done on the 
Electrical Engineering Departments hybrid facility. 
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I. INTRODUCTION 


The purpose of this paper is to provide an up-to- 
date investigation of the methods of Stochastic Approximation 
and its application to the Information Sciences. The 
discussion will attempt to give physical and intuitive 
meaning to the mathematical conditions of Stochastic 
Approximation rather than to reproduce rigorous proofs , 
which can be found in the referenced literature. 

1.1 Definition of Stochastic Approximation 

Stochastic Approximation is essentially an algo- 
rithmic technique for optimization in the presence of 
uncertainty. This uncertainty or noise may arise from 
basic ignorance of the underlying phenomena, experimental 
errors or inherent random fluctuations. The nomenclature 
Stochastic Approximation is used to emphasize the sto - 
chastic nature of the errors in, say, the process measure- 
ments, and the use of these measurements (past and present) 
to calculate the approximate location of the optimum or 
goal. In addition, no a priori knowledge of the noise 
statistics is required in Stochastic Approximation 
methods. Such stochastic problems are naturally more 
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difficult than deterministic problems. However, algo- 
rithmic search techniques, whether concerned with random 
errors or not, involve two fundamental considerations: 

(1) selecting a direction in which to move, 

(2) then selecting the distance to move (choos- 
ing a step size). 

1 . 2 Effect of Random Error on Convergence^ s 2 

The effect of random error on an algorithm may 
cause it to converge to some non-optimum value or even 
to diverge. Therefore, correct convergence (stability) 
takes priority over speed of convergence optimization in 
a stochastic environment. In Stochastic Approximation, 
this effect is reflected in the choice of step sizes, *■. 
consideration (2) above. The direction to move, con- 
sideration (1), is selected as if the process were deter- 
ministic. That is the experimental observations are 
assumed to be error free. This means that some step 
directions may be incorrect, but such set-backs are 
swamped-out in the long run by additional data if the 
step sizes are properly selected. Note this is nothing 
more than a modified statement of the law of large numbers . 

1.3 Intuitive Selection of the Step Size 

The following -statement by Poisson of the empiri- 
cal law of large numbers sheds commonsense insight on the 
method of Stochastic Approximation. 
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In many different fields, empirical phe- 
nomena appear to obey a certain general 
law, which can be called the Law of large 
Numbers. This law states that the ratios 
of numbers derived from the observation of 
a very large number of similar events 
remain practically constant, provided that 
these events are governed partly by con- 
stant factors and partly by variable fac- 
tors whose variations are irregular and do 
not cause a systematic change in a definite 
direction. Certain values of these rela- 
tions are characteristic of each given kind 
of event. With the increase in length of 
the series of observations, the ratios 
derived from such observations come closer 
and closer to these characteristic constants. 
They could be expected to reproduce them 
exactly if it were possible to make series 
of observations of an infinite length. 3 

It is upon this experiential truth that Sto- 
chastic Approximation methods, as well as all applica- 
tions of probability theory, are based. - Intuitively, 
then, one knows that if the present estimate (method) is 
backed by extensive observations (experience), then it 
should not be significantly altered by new data. The 
converse is true for an estimate based on relatively few 
noisy observations. A simple illustration of this is 
the sample mean. For example, let 


y = M -t e 


where y = measurable information 

e = experimental error (unbiased) 
and M = desired constant. 

Then after in observations ( y 1 , . . . , y ) , the best esti- 
mate of M Is 
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M n - 


1 

n 


n 

2 

i=l 


y i 


^ " 5> M n-1 4?., 


M 


n-1 


1 (y 

n. w n 


M n-1> 


( 1 ) 


.where M n _-^ = "old" estimate 
and y^ = new datum point 

Note that M nMl is weighted by (1 - ~) , whereas, y n is 
weighted in inverse proportion to the number of observa- 
tions, which approaches zero as n approaches infinity. 
However, this defense of the status quo is no longer 
valid if there exist changes in the process. In such a 
case, an adaptive weighting technique must be devised. 
This will be further discus.sed in a later section. 
Regardless, the method of weighting new data in propor- 
tion to l/n is of fundamental importance in determining 
the step size in Stochastic Approximation. This is 


because 


lim 

nr*°° 


n 


= 0 • 


Therefore, if the step sizes are decreased according to 
the harmonic sequence {-|~} , the corrections approach zero 
in the limit. This property is necessary for convergence. 
Second, 
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E 

n=l 


1 

n 


CO 


This property of the harmonic sequence guarantees that 
the correction process will not stop short of the opti- 
mum point regardless of the initial estimate., i.e., the 
sequence has an infinite amount of corrective effort. 
Third, 


E 

n=l 



< 00 


or, equivalently 


E 

n=N 



0 as N -* 05 . 


This property ensures that the cumulative effect of the 

2 

noise error variance remains finite. Why this is so 
will be explained in section 2.4. 

In Chapter II, specific Stochastic Approximation 
methods will be reviewed, and it will be shown that 
equation (1) is actually a Stochastic Approximation 
algorithm. 



II. METHODS OF STOCHASTIC APPROXIMATION 


Historically, two basic types of Stochastic 
Approximation were developed. The first, was the Robbins- 
Monro (R-M) procedure^ - for finding the unique root of an 
unknown regression function and the second was the 
Kief er-Wolf owitz (K-W) procedure for finding the maxi- 
mum of an unknown unimodal regression function.^ 

r 

Dvoretzky unified and generalized these earlier studies. 
Detailed reviews of the above results and their variations 
may be found in Derman,^ Schmetterer,® and Venter. 21 


2.1' Robbins-Monro Method 

The R-M algorithm is the exact stochastic analog 
of a simple deterministic algorithm for solving 

M(x) = k (2) 

where MrR 1 -> R 1 

and k is any real number. 

The deterministic algorithm is 


*n+l “ x n + a n ' M ( X „H (3) 

where is sequence of real numbers xvhich must satisfy 
certain conditions to ensure convergence (see Ref. 9). 

6 
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When there is random error presents M(x) cannot 
be measured, but a noisy observation y(x) of M(x) can be 
made. Now, however, y(x) 'is a random variable with a 
distribution function F(yjx) defined such that 

co 

E{yjx} = J y(x) d F(yjx) = M(x) for all x (4) 


Thus M(x) is the regression function of y on x. The 
problem is still to iteratively solve equation (2), but 
equation (3) is no longer meaningful, regardless of whe- 
ther F(yjx) is known or not, since M(x) is not observ- 
able. Under these circumstances, a stochastic version of 
equation (3) is defined 


x n+l 



+ a n [k - y(x n )] 


(5) 


where {x n } is now a sequence of nonstationary random vari- 
ables which converges in some stochastic sense to the 
solution of (2) . 

2.1.1 Convergence 

Robbins and Monro proved that the algorithm 

(5) converges in mean-square to the' correct solution, say 
A 

x, of (2) if the sequence {a^} satisfies the three con- 
ditions 


(a) lim a = 0 

T1-+CD 


0 >) 


n=l 


a n = 


( c ) 


E 

n=l 



00 


< CO 
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and the regression function M(x) can he bounded on either 

A P 

side of the solution x by a straight line. 

(d) M(x) < a x - x + b (a, b > 0) 

(e) E{ M(x) - y(x) 2 ] = a 2 (x) < a 2 < “ for all x 

The physical meaning of the conditions on is exactly 

the same as stated in section 1.3 where is interpreted 
as the step size. Note that the harmonic sequence {l/n} 
not only satisfies (a), (b), and (c), but also gives the 
fastest possible reduction of the step size without viola- 
ting any of the conditions; that is, for any sequence {“^3 

n 

CO 

2 -i— < °° for a > 1 

n=l n a 

Condition (d) is necessary to prevent an overshoot of x 
that cannot be corrected by a sequence {a.^} satisfying 
(a), (b), and (c). Condition (e) is required for the 
obvious reason that if the variance of the measurements is 
not finite for all values of x, then it would be impossible 
to guarantee conversion of the algorithm in general. 

Blum"* -0 and Kallianpur"^ established independently 
that the above conditions are sufficient for convergence 
with probability 1 of algorithm (5) . As in Ref. 2, the 
statement is often made in the literature on Stochastic 
Approximation that probability one convergence implies 
mean square convergence. This is not so. However, mean- 
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square convergence does imply convergence with probabil- 
ity one under certain conditions (see Ref. 36 ) , but not 
in general. 

2.1.2 Root Finding and Extremum 

To use the R-M technique for finding the 
unique zero of M(x), one simply lets k = 0 in equation 
(4) giving 

Vn = x n - a n y ( x n> ■ < 6 > 

If M(x) has multiple roots, then there is no a priori way 
to know to which one equation (6) will converge. Starting 
from the same initial estimate x Q , (6) may converge to a 
different zero of M(x) each time the iteration process is 
run. This effect is a result of the noise in y(x ) . 

The R-M method can be made to search for the 
unique extremum of M(x) with no inflection points, by 
simply searching for the root of M 1 (x) . If feasible, this 
is the most effective Stochastic Approximation procedure 
for extremum searching, i.e., it gives a faster convergence 
rate than the K-W method. This approach -is difficult 
•because it must be assumed that 

1-. M(x) is everywhere differentiable. 


2. M'(x) =|_ E {y( X )| X } = E{|_y( x )|x}. 12 

and the measurements of ~ y(x) will generally be extremely 
r 

noisy." ’ These problems lead naturally to the K-W proce- 
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dure, which will he discussed in section 2.2 after a com- 
parison of the rate of step-size reduction for the R-M 
algorithm and a deterministic algorithm. 

2.1.3 Stochastic Vs. Deterministic Step-Size 
Reduction 

Such a comparison is informative because it 
will make salient the effect of noise on the rate of step- 
size reduction. For simplicity, the deterministic Bolzaro 
procedure will be used. It successively halves the step 
sizes. 


X n+1 x n 


x. 


n 


x. 


'n-1 


1 

2 


For comparison with the R-M algorithm, it is necessary 
to use expected values since 


x n+l ~ x n 


a n y (- x n> 


depends on the noise in the particular measurement y(x ) . 

' V , 

Therefore 


E 

^n+1 “ x n 

a n E HV 


E 

x n " x n-l 

a n-l E 

y ^ x n-l^ 


= a n 

a n-l M V n _i) 

_ n-1 
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where the harmonic seauence has been used for fa ] 

n ’ 

Still, the form of M(x) must be known. First, let it be 

constant for all x, - 

then 


E 

X , “X 1 

1 n+1 n I 

! n-l 

*i 

H 

l 

X* 
1 i 

! n 

! 



( 7 ) 


Now use the other extreme M(x) = Ax where A is large. 
Then 


E: 

' X n+1 “ x n 

(n-l) Xx n 

(n-l) x n 

E] 

l x n - x n-l| 

n Ax: 

n-l 

n x n-l 


but now x = x. 


n 


c n-l - a n-l M ( x „> = x n-l ' 


Ax. 


n-l 


n-1 


Therefore, the ratio reduces to 


E x n+l - x n 
E | x n - x n-l 


H" 1 f 1 - wf = 1 - ^ (8) 


n 


In both cases. 


Ex. 


h+1 


x.. 


n 


lim „ 

v. iii X X 

n-»~ n n-l 


= 1 


Thus the noise makes it impossible to decrease the steps 
as rapidly as in the deterministic case, especially late 
in the search. Equation (7) can be obtained from (8) by 
letting A ~ 0. Equation (8) is incorrect in Ref. 2. 
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2.2 Kief er-Wolf owitz Method 

Following the idea of Robbins and Monro, Kiefer 

and Wo if owitz constructed an algorithm for finding the 

extremum (maximum or minimum) ' of an unknown unimodal 

regression function M(x) . Their process is the exact sto 

chastic analog of the deterministic iteration procedure 

1R 

first formulated by Germansky; J his procedure was essen- 
tially a form of steepest descent, 

x n+l " x n ‘ a n 


where & n is chosen s M(x n ) < M(x ) . 

When M ,_ (x ) is unknown and/or noisy, it is neces- 
sary to approximate it in some way. The K-W technique 
uses two measurements of the observable function y(x) 
at (x^ + c ) and (x - c ) to obtain an average slope 

y(x n + c n ) - y(x n - e n) 

2 c n 


which is used as the approximation to M^x ) ? This 
gives the K-W algorithm analogous to (8) 

y ( x n + c n) “ y ( x n “ c n> 


x n+l = x n - a n 'f- 


2 c. 


•] ( 9 ) 


'n 


This iteration process converges in mean- square and with 
probability 1 to the minimum of M(x), say x, if 
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(a) both the step size and the distance 

between measurements approach zero 

lim a = 0 lim c ” 0 

n. n 

rr>co 


(b) As in the R-M method, S a < « 

n=l 


to assure that the correction does not stop 
short of the minimum x. 

«° a p 

(c) S (— ) < a so that the random effects 

n=l c n 


will tend to offset one another in the long 
run. 

(d) To prevent excessive over-correction, a 

restriction similar to condition (d) of the 
R-M process is required. It is 




A 

X 


+ b <■» 


i.e., the average slope of M(x) for any pair 
of measurements can be bounded by a straight 
line . 

2 

Since, even the function M(x) = exp(x ) satisfies reauire- 

2 

ment (d), it is not a severe restriction. 

Even though the K-W algorithm is designed for uni- 

modal functions, it is interesting to examine its behavior 

lit 

on a multiple peak M(x) . ' Such an example will also 

illustrate how to use the algorithm. The example is shown 
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in Fig. 1. Here the maximum of M(x) will be sought so 
the minus sign in equation (9) must be changed to a plus 
sign giving 

f y ? , (10) 

where the sequences a^ = l/n and c n = ~\j 1/n have been 
chosen to satisfy conditions (a), (b), and (c). The noise 
in this problem is additive Gaussian with zero mean and 
unity variance. Therefore., y(x) — M(x) + e. The process 
is started by arbitrarily selecting x^ e[0,5]. Measure- 
ments of y(x) are then made at (x 1 + 1) and (x-, - 1) as 
shown in Fig. 1, where the results for three different 
starting points are illustrated. The convergence to the 
absolute maximum is rapid for an initial x of 0.25 and 
2 - 50 , but for an initial x = 3-00, the iteration con- 
verges to the local extremum M(x) = 12.00. Because of the 
noise, however, it is not possible to predict to which 
extremum of a multipeak regression function that the algo- 
rithm will converge, even if the same initial point is used 
each time. 

2-3 Generalized Process of Dvoretzky 

The basic notion of Stochastic Approximation is 
that for any deterministic algorithm, there exists a sto- 
chastic counterpart, i.e., an algorithm where uncertainty 


x. 


'n+l 


= x. 


n 


1 r y(x ft 

n L 


+ 
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n 

X 

n 

c n 

M(x n - C n ) 

yfv^n) 

y( x rf c n' 

1 
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5.00 
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End 
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.63 

10.20 

11.04 

10.57 

11.07 

5 

3.58 

.58 

8.5 

11.88 

11.25 

11.63 

6 . 

3-39 

etc . 






50 


3.00 


Converges to the local .peak @ x = 3 

Figure 1 



is present in some form. Following this idea, Dvoretzky 
formulated a generalized Stochastic Approximation method 
consisting of a deterministic algorithm T with a super- 
imposed • random component e. 


x n+l = T n ( x i’ • • ' ’ x n> + e n t 11 ) 

where T^, n > is a sequence of Borel-measurahle mappings 

- 

from R (n-dimensional Euclidean space) into R . 

Dvoretzky proved the following theorem for this process.^ 


Theorem 1 . Let a n , P n , Y n be non-negative functions from 
R n into R^ 3 

a n^ x l J * * * * x n ) = ® uniformly y sequences x^, x^,, . . . 


^ £ n ( x l> * ■ * > x n ) converges uniformly y sequences 


lim 

rr*«> 

%< x i 

CO 


E 

n=l 



X 2 

CD 


E 

n=l 

Y J x i 


X 2 

Further, . 

T ( 
n' 

X 1 ’ * * 


^ Tn^ x l* * * * 3 x n ) diverges i,o °= uniformly y sequences x-^, 

y • • • 


A 


x p ) - x < max ta n , [ (1 + p n ) x n - x| - y n J } 

( 12 ) 

CO 

y(x 1 ,..., x n ) eR'. Also require 2 E{e 2 } < = (13) 

n=l n 


and E{e 


^ x n 3 — 0 with probability one. Then x^ as 
defined in equation (11) converges in mean square and with 
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probability 1 to x, 


lim E 

n ->oo 


x 

n 


A 

X 


0,, P{lim x n = ^} = 1. 

]TV-»co 


It is important to point out two features of this 
powerful theorem. 

(a) Since T may be a function of all the obser- 
vations (x^, . . . , x n ) , the correction may be based on all 
past measurements,, instead of Just on the latest measurement 
x^ as in the R-M and K-W methods . 

(b) The sequences {ct } , {P n }.» and {y^} can depend 
on the measurements (x-, x ) . 

For example, these ' properties make it possible to devise 
a stochastic Newton-Rapson method or a sequential least 
squares estimator based on the last m observations, m < n. 
The resulting accelerated convergence is obtained at the 
expense of computational simplicity. 

. As another illustration of the versatility of 
this theorem, consider the sample mean given by equation 

(i), 

“n'^n-l + H (y n - M n-1> W 

where M is the unknown mean. By defining the noise-free 
algorithm to be 

T = (1 - a ) M n + a M, = a = - 
n ' n' n-1 n n n n 
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and. the superimposed random component to be 

e = a (y - M) , 
n n w n 11 

then ML = T + e_ (la) 

and |- n " mJ a (1 - a n ) • - M j is a special case of 

(12) and e n satisfies (13)- Therefore, (1) is a special 
case of Stochastic Approximation, which implies 

lim E [M - M)^} = 0 and P{lim M. - M} = 1. 

n~»co n^ 05 ^ 

Even though simple, this example is important because it 
contains the idea of estimation of. an unknown, but con- 
stant system parameter. If the parameter is also time 
varying, then can be made dependent on the last m < n 
measurements. The result is an adaptive parameter estima- 
tor, e.g., see section 4.5- 

2.4 The K-W and E-M Methods as SpecialcCaJsessof 2 
Dvoretzky's Process 

By defining T^ and e n as 

T n = x n + 2^ tM ( x n + °J ~ M ( x n ‘ °n> 1 O a ) 

a 

e n = 2^ [y( x n + c n> ’ M < x n + c n> “ y( x n ‘ c n> 

+ M(x n -o n )], (9b) 

then x n+1 “ T n + e n gives the K-W algorithm, equation (9) . 



Similarly, choosing and e n as 

T n = x n + a n “ M ( x n)J (5a) 

e n “ a n [M( x n ) - y(x n ) ] ' (5b) 


then x n+ ^ - p, t- gives the R-M algorithm, equation (5). 

At this point, the necessity of requiring that the 

sura of the observation variances be finite (equation (13) 

of Dvoretzky's Theorem) will be explained. For simplicity, 

e n unbiased so that the total measurement variance 

is S E{e 2 ) = a 2 
n=l n e 

- 2 09 p p N p 

- L ^ a e a > ^{e^ ^ = CT pi - 2 E{e n } which is just 


n=F+l “ “ n=l 

uhe variance for the measurements remaining after N trials. 


ST 


op 03 

Since lim 2 E{e } = a , lim 2 {e = 0. 
K->“ n=l e n=N+l n 


Therei ore, the variance of Ihe error approaches zero as the 
number of observations increases to infinity. This 
property is obviously necessary for convergence and holds 
°uly if is finite. It is the selection of the step 
sizes that ensures this condition; for example, in the R-M 


process 
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“ O O P P 

- S a 2 Ef[M(x ) - y(x )]t = 2 a o (x ) 

n=l n n=l 

< ct E a , where a > a > a (x ) , v n 

n=l n 

" 2 

< « iff i E a * < ®. 

n=l n 

2 

Hence the requirement that 2 a < 03 in the R-M process 

n=l n 

60 a p 

and similarly for 2 (-^) < “ in the K-W process. 

. n=l c n 



III. OTHER PROPERTIES AND EXTENSIONS 

OP STOCHASTIC APPROXIMATION METHODS 

In this chapter the following topics will be dis- 
cussed : 

1. multidimensional Stochastic Approximation 
algorithms 

2. generalized regression functions 

3- asymptotic distribution of the estimates 

3 . 1 Multidimensional Stochastic Approximation Algorithms 
The multidimensional R-M and K-W methods x^ere 
introduced first by Blum., who used a LJyapunov type 
approach to prove convergence with probability one of the 
two methods. For the R-M process the algorithm is 

. 2£ n+1 = 2£ n + a n [K - X (x n )] (14) 

where -y K, xeR m 3 an m-dimensional random variable 
y(x) e 

E{y|x} = E{y(x)} = M(x) exists. 

Thus, the problem is to find the solution to M(x) = K. 

To guarantee convergence, the sequence {a n 3 must satisfy 
the conditions previously specified, and there must exist 
a Lyapunov type function V(x) such that 
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V(x) > 0, V x 


and 

m dv(x_) 

(^V(x), M(x)) = X ax. M i(-) - 0 ’ Vx 

Fortunately, this convergence can also he established by 
an extension of Lvoretzky ! s Theorem by simply replacing 


absolute values 


'A 

x n - -x 


with the norms 


A 

*n ~ ^ 


This avoids the search for a suitable Lyapunov function. 

,-l 


if a positive definite m x m matrix R 
equation (l4) giving 


is inserted in 


* n = x n _ 1 + a n R [K - y(x(n)) ] (15) 

the convergence is not affected. . ..• : v And, if one 

knows E[y y T ], then choosing R = E{y y T } _1 decreases the 
variance of the estimates . Rote that there is still no 
assumption on the noise structure. later a recursive 
scheme for calculating R will be developed. 

The multidimensional K-W algorithm can take sev- 
eral forms, but in each case difference approximations are 
needed for every component of-v,M(x). Blum’s method 
requires m-M observations at the points 


o 

X = X 

— n — n 


z(2 n ) = y(<) 

1 

x 

l(4Vl) = y( x n) 

— n — n 4 c r^l 

_> 


m 

— n ~ — n 


c e 
n-m 
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where e. denotes a unit vector whose j element is 1. The 
measurements y(x^) determine the one - sided derivative 
approximations A _y; n where 

in 

Then the recursive relation 

in+l = ^n ' a n A in ( l6 ) 

converges with probability one to the minimum of* M(x) . 
However, Sacks has shown that the asymmetric observations 
about x n cause slow convergence to the correct x. 1 ^ 

Based on the extension of* Dvoretsky's Theorem, 
Gray proved that the symmetric version of (16)- converges 
with probability one and in mean square. 1 ^ It is 


—nil ~ — n 



where 


(17) 


Ay = 
— n 

i — j 

N 

JF 

I 3_H 

- y(x “ 
— v — n 

) > * • • ; 

y(^) y(x 

'audio 

= X 

i c e . 

and x"'-* 

= x - c e . 


— n — n 

n-0 

— n 

-n n—j 

J = 1, 

— , m 





■n 


n 


This algorithm converges faster than the previous one, but 
requires 2 m observations (2 for each dimension) instead 


of mil. 
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3 • 2 Generalized Regression Functions 

The purpose of this section is to lay the founda- 
tion for the interpretation of the regression function as 
a performance index so that the methods of Stochastic 
Approximation can he applied to communication and control 
systems. Since Stochastic Approximation methods are 
applicable to any problem that can be formulated as one of 
regression, the extension is not difficult. 

First, assume the observations- are from a random 
process y(t) and there exists a function t(.,.) which 
depends on y(t) and a parameter vector k. The performance 
function & (y, k) determines the performance index L(k) 
defined as a regression function 

L (4) = E y ^(y>4)} (18) 

CD 

= J My,k)d F(y) 

— CD 

It is desired to minimize L(k) by selecting the optimum 
k = k. 1 If L(k) is a convex function of k, then k is 
given by 

• V&) =0 (19) 

if L(k) is known, equation (19) can be solved iteratively 

*1 $ 

by the gradient method giving^ 0 

-n+1 = — n “ a n Vk L ( E n ) 
where k ^ converges to k. 


( 20 ) 



STote that deterministic problems can be put in this format 
by letting the density function f (y) ; be a delta func- 
tion. However, when the distribution function F(y) is not 
given a priori, L(k) is not known. This condition is 
precisely the motivation for Stochastic Approximation 
techniques. Thus if £(y,k) is differentiable, the R-M 
method gives 

— n+1 = k n a nv k^n+l' -n^ ( 21 ) 

as an iterative solution to equation (19). When .&(y,k) 
is not differentiable, the K-W method gives 

—n+1 = -n - a n A * n ( 22 ) 


where A _£ n is the vector whose jth component is 


.3 = IlRn+l' ' ^ y n-KL' 

2 c 


n 


n 


1, . . . , m 


Algorithms (21) and (22) can be shown to converge in mean 
square and with probability one (see Refs. 6, 12, 15, and 
17) for most problems in engineering application. The 
most restrictive requirement is that jfc(.,k) have a unique 
extremum . 


Rote the similarity between these stochastic algo- 
rithms and the deterministic algorithm (20) . However, 
since V-^i(y,k) in equation (21) or its approximation in 
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equation (22) depends on a single realization of the 
random process y(t) which may contain noise , k^ is a 
non- stationary random vector. 

In the important special case where the perfor- 
mance index is the mean- square error, algorithm (22) 
reduces to algorithm (21). 12 For example, let 

(1) x(t) be a noise corrupted signal (the noise 
is not required to be additive), 

(2) S(t) be the desired signal. 


(3) ^F^t) - &(t,k) be the estimate of S(t) 


where the k. are the adjustable parameters 
that weight the outputs of the filters 
F.(t) such that the minimum mean-square 

O 

error is obtained. This form is general 
since if a sufficiently large number of 
F.(t) are used, the overall filter can 
approximate arbitrarily closely any non- 
linear operator.^ 

(4) The error e(t,k) = S(t) - S(t,k) 

(5) The performance function 4(e(t),k) 

= & ( e (t,k) ) = e 2 (t,k) 

(6) Thus L(k) = E{e 2 (t,k)} 

For discrete values of t, algorithm (22) can be used to 
minimize L(k) , where the Jth component of A becomes 
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i _i m 

A •% = ( 2 C J " C*[S(n) “ 2 k p (n) - c(n)P.(n)l 

i=l J 

m 

- *[S(n) - _2 k^Cn) + c(n)P J (n)] 
Since -2(e) = e^, this reduces to 


i r = (2 
n ' 


n 


m 

) -»[S(n) - kiP . (n)] c n P (n) 3 


2[S(n) - ^ ^.(n)] Pj(n) (23) 

=yr- i ( e ( t -is)) 

' 3 


j = m 

Hence, A A ^ = y ^(e(t,k)) which means the K-W procedure 
reduces co the R-M procedure for the mean-square error 
criteria. This is important because the R-M algorithm 
is computationally simpler and converges faster. 


3-3 Asymptotic Distribution of Stochastic Approximation 
Estimates 

Even though Stochastic Approximation methods are 
nonparametric (no assumption regarding the form of the 
distribution function of the noise is necessary), it can 
be shown that under rather general conditions the esti- 
mates are asymptotically normal. 
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Considering first the one -dimensional R-M algo- 
rithm equation (5)> Sacks proved that for a^ = A/n the 
random variable 

YT (X n - x)'is M [Oj 


where <j' 


= sup x E{ 


y(x) - M(x) 


12 


} < 


and C = M' (x) = slope of the regression function at x = x. 
m the multidimensional case, the random vector 

Vn(x n - x) is also asymptotically normal NfO^QP -1 ], 
where 

PQP 1 is the covariance matrix 

o * 

Q has entries q^. = A c (ab ± + Ab^ - 1) ir.^ 

* 

TT = P TfP 


7T = lim [y(x) - M(x) j [y(x) - M(x)] T 
A 

X~*X 

A n 

v x M (x) = B = PDP 

D = diagonal matrix of eigenvalues (b j=l, . . . , m) of B 

J 

P = orthogonal matrix o B = PDP""' 1 ' 


For the uni- dimensional K-W algorithm of equation 
(9) j the random variable - x) is again asymptoti- 

cally normal 


N[0, 


.22 

A a 

W A-l) 


3 



29 


where 


o * = sup^ E{. y(x) 
and 


- M(x) ^} < co 


An 


C = - * M (x) 


In the multidimensional K-W process the random vector 
Va c n (2£ n “ k) is also asymptotically normal N[0, PQP -1 } , 

where Q has entries q. . = A 2 (4Ah. + 4Ab . - l)” 1 tt.T 

"1,1 x 1 .1 ' xj 


* _i 

tt - P irP 


TT = 


lira [y(x) 

A 

x-^x 


- M(x)] [y(*) “ M(x)] 


T 


B = PDP 


-i 


D = diagonal matrix of eigenvalues of B 
P ~ orthogonal matrix 3 B = PDP - “ 


For details and proofs see Ref. 16. 

In the next chapter , techniques for accelerating 
the convergence and increasing the efficiency (in the 
statistical sense) of the estimators will be considered. 
The discussion will be limited to the R~M process, since 
analogous results hold for the K-W process. 



IV. METHODS OF INCREASING- THE RATE OF CONVERGENCE 
AND EFFICIENCY OF THE ESTIMATORS 

In algorithmic techniques, one wants large step- 
sizes when the goal is far away and rapidly decreasing 
step-sizes as the goal is approached. Historically, 
Kester was the first to present such a procedure for Sto- 
chastic Approximation methods.^ 

4-1 Kesten 1 s Acceleration Method 
For the R-M algorithm 

x n+l = x n “ a n y ( x n ) (24) 

this procedure simply keeps the value of a. n constant until 
the sign of the observation y(x n ) changes, then a p is 
decreased in a manner that satisfies Dvoretsky's Theorem, 
ine motivation being that when the zero of equation (24) 
is not near at hand, then the measurements of y(x n ) will 
in general be oi the same sign. However, -as the goal is 
approached, overshoot will occur causing the estimates to 
oscillate about the zero of M(x n ) . In the latter case, 
the step sizes should be decreased rapidly. The table 
belov/ illustrates the technique.^" 
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Measurement # 

1 

2345 

6 7 

8 

Total 



- 



move- 

ment 

Sign of 
Measurement 

a. 

4 

+ + . - 

+ 

-h 


Unaccelerated 
values of a 

n 

1 

1/2 1/2 -1/4 -1/5 

1/6 -1/7 

1/3 

149 
“ 180 

Accelerated - i- 






values of a 

n 

1 

1 a -1/2 -1/2 

1/3 -1/4 

1/5 

0 17 
2 W 


Cruz -Diaz has suggested a normalized R-M method 
x n+l = x n “ a n sgn (25) 

which converges under the same conditions as the regular 
R-M algorithm. 2 This approach greatly accelerates con- 
vergence for regression functions such' as M(x) = xe -x 
whose amplitude is very small for values of x much greater 
and much less than the actual ’zero x. 

4.2 Dvoretzky 1 s Optimum Sequence 

In Ref. 6, Dvoretzky proved the following minimax 
result for the sequence . 

ineoxem 2. If the random variable y(x) satisfies 
2 2 

(x)3 < a < co and whose regression function 
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M(x) = E{y(x)} is bounded by 


0 < A x - x 
and it is known that 


< M(x) < B 


A 

X - X < 00 



then the seauence of 


a._ = 


AC 


_ A V n 


2 + nA 2 C 2 V n ” A+B 


(26) 


(27) 


(28) 


a 

yields the upper bound 

T7 -rtf A » 2-j c 2 C 2 

max V n = max Six - x) j = —5- — p— o 

x. n a + (n-1) K~QT 


'n 


'2 3 ) 

j 


x 


■n 


n 


+ A v- 


(29) 


n 


The sequence defined by (28) is optimum for the R-M 

process of equation (24) in the sense that for any other ' 

sequence, the upper bound given by (29) is violated. 

The result is minimax since a is chosen so that the 

n 

maximum possible" value of equation (29) is minimum. A 
heuristic proof of this theorem is given in Ref. 2. By 
using equation (27 ) , the constant C can be eliminated from 
(28) and (29) giving 

(30) 


a = 


A 2 

max Vn = max E{(x - x) } = 
x n x.. 


A[n + M 


2 A 
o a 


n 


A[-Sg^ + (n-1) A] 


( 31 ) 
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Expression (30) indicates that the optimum sequence in 
the minimax sense is not harmonic and does not depend on 
the noise. The one case where a^ reduces to a harmonic 
sequence is xtfhen M(x) is a straight line, i.e., A = B. 

Then a^ = The effect of the noise shoxv's up only in 

the variance of x. n or uncertainty in the location of x 
given by equation (31) . Note that it is A , the slope of 
the loxtfer bounding line, that determines the size of the 
interval of uncertainty for large n. Thus, if A is small, 
the interval of uncertainty remains significant for a much 
larger time . 

The entire discussion of the last paragraph has 
been predicated on the assumption that jx-^ - x < C, 
where C is a knoxvn constant. If this is not true, then 
equation (30) is not -the optimum sequence. However, a 
minimax solution can still be found. There are two worst 
cases; one x^here the expected value of the measurement 
at x 1 = 1 x 1 falls on the upper bound B x - x and the 

Q 

Other where x-^ = x-^ falls on the lower bound A x - x[ 

(see Fig. 2) . In the first case, 

*{\) = B ( lx 1 ~ *) 

Therefore, since x g = - a 1 y(x- L ) and 

Slxg} = x 1 - a-jM(x^) , 
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E^Xg} - x = 1 x 1 - a 1 M( 1 x 1 ) - x 

i ' _,,l A\ A 

= “x-, - a 1 B( x 1 - x) - x 

= (1 - a 1 B)( 1 x 1 - x) (32) 

Similarly, in the second case, 

E{ 2 x 2 ) - x = (1 - a 1 A)( 2 x 1 - x) . (33) 

For any given x^, equations (32) and (33) become .the 
inequalities 

E{x 2 3 ~ x < (1 - a jB) (x^ - x) ('34) 

E{x 2 3 - x > (1 - a 1 A)(x 1 - x) 

S{x 2 ] - x < (ajA - l)(x 1 - x) (35) 

The largest possible error in E{x 2 3 is the greater value 

of . (34)' and (35), 

* A 

max [E{x 2 ] - x] = max [(1 - a 1 B)(x 1 - x) , (a^A _ 3-) 

(x x - £)] (36) 

For the minimax solution a^ is selected so that (36) is 
min imu m. This obviously occurs when both terms on the 
right-hand side of (36) are equal. Therefore, 

1 - a-jB = (a-jA - 1) a ± = A + B 
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Then 


rain max [E{x 0 - x}] = 


B 


B -1- A 


A (* n ~ x) 


Thus the minimax choice of behaves as if the regres- 
sion function M(x) were a straight line of slope (A+B)/2. 


So in general if 
2 


, A 

- x 


equal to 


A + B 


> Cj all terms of {a.^} are set 

Then as soon as ix - x < C, the terms 

n 3 


of {a.^} are reduced in accordance with equation (30).' 

Since the asymptotic distribution of the R-M 
estimator 


; 2_2 


V n~ (x - x) is N[0, • — — — ] , 

(2 AC -1) 


the asymptotic distribution of x is 


N[x, 


A 2 2 

A o 


n(2AC - 1)' 


-]- 


Choosing A to minimize the variance of x n+1 gives A = C, ~. 
So 


and 


Var {x^} 



(37) 


a 


n 


nC 


(38) 


is the sequence that gives the lowest asymptotic vari- 
ance. The conclusion is that Dvoretzky's minimax sequence 
sacrifices - long-term efficiency for short-term efficiency. 
Note that in the vicinity of the zero x, the regression 


function may be closely approximated by the straight line 
M ( x n ) ft* C(x n - x); therefore, y(x n ) C(x n - x) + e. 



37 


If the experimental error (which has mean 0 and variance 
2 

a ) is also normally distributed, then y(x n ) is 
H[ C (x n - x), a ]. The Rao-Cramer lower bound on the 


variance of unbiased estimators for x is given by 

A x 

-11 = i . 

min 

A s 


1/n E{[ , a , Xn ,. pfe; .. x) , ] g ) = ^ 


E{[ Sto p(y;x) jSj = E[ Cfr - C(*n - »)) 2} 
^ a 2 


= t: E{[ y - c(x n - x)] 2 3 


O' 


cf 2 _ 

4 a ~ 2 

a a 


Thus V . = which is exactly the asymptotic variance 

„ y CZ- 

of x^, given by equation (37). Consequently, for the 
case of Gaussian noise, the R-M algorithm gives an un- 
biased asymptotically efficient estimate of x. 


4.3 Summary of Section 4.2 

In the previous section one sees three stages of 

the algorithmic search, in each of which the selection of 
* p 

the coefficients a^ differs. The first stage Is when 
A 

the goal x is far away. Here, the coefficients should be 
largest and such that 

2 

a_ =„ — — — n - 1, . .., m 


n A + B 
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Secondly, when x n is close enough to x to satisfy equa- 
tion (27), then the coefficients are set equal to 



Finally, when x^ is near enough to x for M(x) to he 
linear, the coefficients should he 


a p ~ n - p + 1, ... 

where C, - M(x) . 

In practice, it is impossible to exactly carry out this 
procedure because 

(1) the hounds A and B on the regression func- 
tion M(x) must he estimated in general, 

(2) the constant C in equation (26) is known 

since the experimenter selects it, hut x 

is unknown so it is not possible to deter- 

I A 

mine precisely when x^ - x < C, 

(3) the slope of the regression function at x 
must also he estimated since both M(x) and 
x are unknown. 


4. -4- Another Minimax Method 

One rather obvious method of accelerating the con- 
vergence of x to x is to simply average, say N, obser- 
vations of y(x) and use this smoothed measurement z(x) in 
place of the y(x), i.e.. 
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let z ± = ~ [y(x 1 ) + . • . + y(y- n ) ] 


z ? = t? iy ( x w 4 .i) + ••• + y(xp ra )] 


N 


mi 


2Ir 


z n “ N f y ^ x N(n--l)+l^ + * * * + J ( X Wn) ^ 
and the R-M algorithm of equation (6) becomes 


x . n = x - a 
n+1 n n n 

and assuming the random error is stationary, 

Var [y(x)] = o 2 (x) = o 2 


09) 


Therefore, 


and 


Vto [Z n ] = f- 


H 


EtZ n } = S M(x) = M(x) 


so the bounds of equation (26) hold for E{Z n ) . Using 


-V 


2ct 2 /W 

A(B-A) * 


a. 


AC 


h 2 p p 

~ + nA C 


2 2 

and max E{(x - x)} = max Var {x } = c g ^ p 

x„ n n cr/N + (n~l)A 2 C 2 


n 


which are the equations of algorithm (39) analogous to 
equations (27), (28), and (29) of algorithm (24). With 
these analogous forms, one obtains the following minimax 
result for algorithm (39) 
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max Efx 

n 


x] 


^ = A[n 4- ] 

2 

max Var{x } = — 

* n n A[^A + (n-1) A] 


(40) 

(41) 


This result indicates, as would be expected, that a^ is 
unchanged and max Var fx^} is reduced proportionally to 
the number N of observations used in the smoothing pro- 
cess. TrJhat this process has achieved, for some given 
max Va.r[x n 3, is a reduction by the factor N of the 
number of algorithmic iterations required, but the num- 
ber of observations of y(x) is not reduced. 

At first glance it appears that the same reduc- 
tion in variance would accrue with an associated reduc- 
tion in observations required if we let 

Z 1 = y ( x l) 

z 2 = | [yteq) + y(x 2 ) 3 

z n = n [y ^ x l) + * • • + y ( x n ) 1 (42) 

However, I tried this approach using Dvoretzky's method 
of attack (see Ref. 6), and ended up with equations (30) 
and (31), which indicates no reduction of the variance 
in x n . The reason for this is because the z n defined by 
(42) contains no more information about M(x), excepting 
the new observation y(x n ), than does z n 
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4-5 Acceleration and the Method of Least Squares 

A standard problem in optimal, filtering is : 

Given n measurements of y where 

y± ~ ~ r i = lj 2, . .., n (43) 

_c is a known 1 x k row vector which may change with i, 
x is an unknown, but constant k vector, and e is an un- 
biased random variable: find the estimate x^ of x such 

bhat J n (x) is minimized. For a least squares solution, 

J n is 

J n(2) = *£ E n - - c n x) T (y n - eye) (44) 

where E T = [ e . , .... e ] 

i n J 

Y n = *- y -r * * *' y n^ 

C 11 c 12 * ‘ * c ik 

C n = C 21 c 22 * * * c 2k 

* ♦ . 

_fnl C n2 c nk_ 

The solution is 





ihe sab sc rip l n represents . number of observations of 
y from which the least squ ; s estimate ^ of x is made.. 
Fnat is now needed is a r -.rsive version of (45) so that 
new data can be incorporated iteratively as it is 
received. This is achieved as follows: assume another 

observation y n+1 = c. n+1 x + e^ is made, then 


~ C n+l C n +1 - —n+1 C n+1 Y n+1 


wnere 


and v 


Y n+1 


Thus 


r n T „ , T _ a 

^ C n C n + -n-fl -n+1^ -n+1 


C n + + —n+1 tx+l 


= Cc n y\ + 4y n+1 


Subtracting C_c^ +1 — n+l^ frora both sides 


gives 


[C T C + C T c 1 fx - \ \ - o T •«, n 

n n —n+1 —n+1 ^ '•—n+1 — n' —n+1 y n+l ^ c n+l —n+1 ^ 


—n+1 ^- y n+l ~n4.i x rJ 


•n+1 n- 


^ + ^c n + cS +1 c n+1 r 1 4, 1 c yn , 1 


- 


wnich is the desired recursive relation. 


( 46 ) 
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However, equation (46) stir." requires a matrix inversion 
every time a new observat: - is made. This difficulty 

can be removed by using th 'inside out 11 lemma of numeri- 
cal analysis. First let P^ ' = C. (47) 

so that 


P nil C nil C nil “ C - c - + c -' 


n n -nil -nil 


-1 T 
P + c x C 
~n —nil —nil 


(48) 


Then by ohe lemma. 


'nil 


= - P c 


n 


n -nil 


1 [ £-n + l P n 4+1 + 


-1 


c P 
—mi n 


(4-9) 


Since £ n+1 P n c_ n+1 is a scalar, the problem has been sig- 
nificantly simplified. Substituting (47) into (46) gives 22 


A 

-nil 


^ + Cp ; 1+ 4 +1 cy. 


nil J nil 

A 


. A T 

c x ;] 
—nil — n J 


-n + P nil —nil ^ y nil —nil —n^ 


= - Vi —nii ^mi - y n+1 ] 


-n 


(50) 


Throughout the above discussion, the existence of the 
T 

inverse of C n+1 C n+1 has been assumed. This is analogous 

oo The observability condition discussed by Kalman in the 

case of state estimation of dynamic systems. 2 ^ 

To relate the recursive formula (50) to Stochastic 

Approximation, premultiply equation (48) by p and 

nil 

posumultiply by P^ obtaining 
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P=P Ft o 1 c Pi 
n nil L “ -nil —nil n J 


rji 

Postmult ip lying again by c. gives 


= P ^ Col, + cl, 0. 


T 


or 


'n -nil nil ‘--nil ' ''nil -nil in -nil- 


P C ^ 1 = P C T Fc P C ^ 1 j- Tl“^ ("ill 

"nil —nil rrPn.ll "—nil n —nil ^ ) 


When is time invariant _c. n , 1 = c_ and equation (51) 


oecomes 


T 


T . 


? n+ l O' - P n o- [c P n c- + X] 

- p n -l ^ C2 P n-1 S. T + P 


(52) 


rn rn _ 1 

= P n _2 £ [3 £ p -_o c- 1 I] - 1 


- n-2 - 


= P Q c T C (nil) c P Q c T l I]" 1 (53) 


Equation (53) was obtained by repeated application of (52) 
to itself. For a large number of iterations ^ the asymp- 
totic versions of (53) and (50) are 

T 

P c" 

rp Q rn _ n 

p n+l o- - n+1 [c P Q e T ] 


n large (54) 


— n+1 ” P + tgl C = P o ^n+l ' x n ] (55) 


P_c' 

o— 


Tn-1 


A 


rp rp _ -j 

Since P o c~ [_c P Q _c~] ~ is simply some constant k x 1 


T 


vector, each element of P n c_“ is just a constant 



divided by n+1, for ” large. 
Therefore, 
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4im P- c. 0, k x 1 vector 

n~»co 



wnich are the vector equivalents of the properties 
required of the sequence {a n } in Stochastic Approxima- 
tion methods, and P c T plays exactly the same role in 

-1 . -1 

algorithm (50) as does a^ = & in the R-M algorithm of 
equation (1 . Denoting ? n c T by y (y^ . . y^) , (50) 
becomes 



A 

n- 


+ 1 ( 7 ’!* 


y n ) Cy„ - c , _ x .. 1 
y n y ^ n — n-fl — n-i J 


where y. depends on all past measurements. The presence 
of P o allows °^e to use any available a priori knowledge. 
For example, if the confidence in the initial es.timate 

A 

-o is low, choose P Q = I. It has been shown experimen- 
tally that Stochastic Approximation algorithm (50) con- 
verges much more rapidly than any of the previously 
mentioned acceleration techniques. 2 ^ However, two 
iterative computations, equations (50) and (51), are now 
required; thereby paying for the increased acceleration 
with computational time and complexity. It should be 
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emphasized ohat algorithm ( 50 ) is valid only when the 
parameter vector x. is time invariant. 

Similar results hold in the time varying case. 

For 


y,. 


= C. X. 
— i —x 


~ * 2Ei an< 3 ® is a known transition matrix, these 
results are 


■n+1 


n+1 


5 ^ + "n+1 -S-n t+1 4 + H' 1 (*n+l ~ =„+l4 n ) 

(56) 

"n+1 " N n+L 4 C =n H n+1 + n' 1 = n M n+1 (57) 


N = © p §■ 
n-fl n " 


(58) 


These tnree equations define the adaptive estimation pro- 
cedure alluded to at the end of section 2.3. They are 
valid when £ and x are time- varying and can easily be 
adapted to the case where I is also time-varying.^ The 
similarity between this estimator and Kalman's estimator 
is striking. 



V. CONTINUOUS TIME STOCHASTIC APPROXIMATION METHODS 

The purpose of developing continuous time Sto- 
chastic Approximation algorithms is to provide differential 
equations analogous to the difference equations (6) and 
( 9 ) . These differential equations can then he implemented 
on an analog computer. 

5*1 The Continuous R-M Algorithm 
' By writing equation (5) as 

x n+l - x n “ - a n y( x n ) 

and considering the limiting case,, we obtain the differ- 
ential equation 2 ^ 

*“ " a (t)' y(x(t)) (59) 

in which a(t) must satisfy 

CO w 

lim a(t) — 0, J* a(t) dt = » and J a 2 (t)dt < “ 
fc_>co o o 

Ihe multidimensional version of (59) is Just 

x(t) = - a(t) £(x(t)) (60) 

vith the same restrictions on a(t). 
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For example, a suitable choice, of a(t) is 

»(*) “ * - 0 ■ 

Both of the above algorithms converge in mean square and 
with probability one under slightly more restrictive con- 
ditions than the discrete time analogs. 


5 . 2 The Continuous Time K-W Algorithm 
By writing equation' (9) as 

,y(* n + c n ) - y(x n 


x , n - x - a [- 
n-f-1 n n 


c n> 


2 c 


1 


n 


and again considering the limiting case, we obtain the 
one -dimensional K-W differential equation 

y(x(t) + c(t)) - y(x(t) - c (t ) ) 


k igil = . a(t) c : 


2 c(t) 


(61) 


in which a(t) and c(t) must satisfy 

» CD 

lim a(t) = 0, J a(t)dt = ~ 

o 

CO . . 

lim c (t) =0, J* CfHf] 2 dt < » 

nr»<» o . ' • 

The. multidimensional' version of algorithm (17) is 


x(t) = - a(t) A £(t) 


(62) 
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where 

Ax(t) = C £ (/(t)) -x(2 _1 (t)), .... X (i m (t)) - z<£ u Wys6fr) 
where 

x J "(t) - xfa) + c(t) and x~ J '(t) = x( t ) - c(t) e_j 

0 *” • • • } HI 

These two algorithms also converge in mean square and with 
probability one, but again under more restrictive condi- 
tions than the discrete time schemes. 

In the next chapter, application of Stochastic 
Approximation methods will be made to various engineering 
problems. 



VI. ENGINEERING APPLICATIONS 

The basis for application of Stochastic Approxi- ‘ 
mation methods to engineering problems was laid in sec- 
tion 3.2, where the minimization of a performance index 
was formulated as- a regression problem. However, the 
presence of constraint equations was not considered, but 
can be easily included using Lagrange multipliers. 

Assume it is desired to minimize 

L(k) “ Ey U(y,k)} 

subject to the constraints 

E^(k) = Ey{f^(y,k) } =0 i=l, . .., M < m 

where m is the dimension of k. ■ Then by defining the 
auxiliary loss function 

1 T 

= X + V 1 f 

is a M x 1 vector 

f is a M x 1 vector of constraints 

and using this new loss function in equation (21), we 
obtain 

‘ > 

— n+1 = -n " a n Vk^^n-fl' -n' -n) ( 63a ) 

~ —n ~ a n V^n+l^n? “ a n ^n V k f(y n+1 ,iS n ) 
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Equation (63) is a function of X_ n . which is given by the 
companion algorithm • 

i-n+l " ^n> ^ 

mxl 

where b n must be a Stochastic Approx mat ion sequence and 
V k f is a M x m matrix. 

Inequality constraints can also be handled, but 
require the introduction of an additional vector variable 
that converts the inequality to an equality constraint. 
The result is three interdependent algorithms. 

With this foundation, some application of Sto- 
chastic Approximation methods will be presented.. 


6.1 Coding Theory 

Schalkwijk and Kailath 27 considered the problem 
of transmitting one of M possible signals, where each 
' signal takes T seconds 'to transmit, over a noisy chan- 
> nel without memory with the availability of a noiseless 
feedback link (such a situation is typical of a satel- 
lite-to-ground transmission) . It is important to remem- 
ber that the feedback path can not increase the channel 

28 

capacity as was first shown by Shannon, but does 
simplify the complexity of the coding and decoding required 

to achieve a given performance. 

To begin with, the communication is assumed to be 

over a forward channel with white Gaussian noise of 
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spectral density ■— and a noiseless feedback channel. 

The message information is transmitted by modulating the 
amplitude' of N orthonormal waveforms cp^t). 


! cp i (t)q>.(t)dt - 6.. 


lj ~lj 2, • • • , R 


Since the time allowed to transmit the information sig- 
nal is T seconds, these waveforms might represent N suc- 
cessive and non-overlapping pulses of duration T/lf. 

Thus, the information signal transmitted has the form 

N 

S(t) « £ -d.cp .(t) 
i=l 1 J 

and the received signal is 

Y(t) = S (t ) + N(t) 

Reception is then achieved by using filters matched to 
the ‘waveforms cfn(t), giving as outputs 


Y, + N. 

1 2 . 1 


where, due to the assumed structure of N(t), the are 
zero mean stochastically independent random variables 
with variance 

• This procedure is valid even if the original 
channel is a continuous time channel because the matched 
filter for white Gaussian noise computes the likelihood 
ratio, which gives a sufficient statistic, and therefore 
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preserves all the relevant; information in the received 
waveform required for the. decision process. 

The coding method for sending one of the M pos- 
sible messages consists of dividing' the' unit interval 
[0, 1] into M disjoint equal-length message intervals. 
Then select as the "message point" cp^., the mid-point of 
the kth message interval, i.e.', 

2k - 1 

65 2M * k = 1, . . . , M. 

Now by transmitting the code point cp^ via successive 
s ignals 4 ( t ) , j *= 1, , .. N. At the receiver, an 

estimate of cp, is formed from Y. = 4 . + N. . lotting a 
denote the estimate of cp^ after receiving n values of Y^, 
the mean square error is 

( a n ~ ^k) 2 ^ n = 1, 2, ..., N 

which decreases as n increases. At the conclusion of the 

Nth transmission a 'decision is made as to which code 

message cp^ was transmitted by choosing the coding point 
/ 

closest : to cu.. The error probability P is then given by 

l a n “ ^kl- 1/2 M} - " 

The goal is to invent a coding scheme such that 

for any given e > 0, we have P < e for a transmission 

e 

rate R less than the channel capacity of the Gaussian 
noise channel, which here is assumed to have an infinite 
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bandwidth and with the usual constraint on the average 
transmission power P . For this channel, the capacity 
is 


avg 

2N Q 4n2 


and the transmission rate R - 


bits/sec, 

l06 2 M bits 
T sec * 


It is not possible 'to achieve the above-stated 
goal by simply transmitting cp k , with ,4 ^ = cp^, i = 1, '. . . , 

N and using a fixed rate R'less than the channel capacity. 
However, since a noiseless -feedback link is available, 
the receiver can re-transmit a , its current estimate of 
cp k back to the transmitter. Thus the transmitter can 
simply transmit a correction term to the receiver. Then 
since approaches •cp^. as n increases, the average power 
(in a statistically considered sense) needed to transmit 
the correction decreases as n increases from 1 to N, 

This saving of average power is sufficient to achieve a 
transmission rate arbitrarily close to channel capacity 
while keeping P g as small as desired by increasing T. 

This is the idea behind the method of Schalkwijk and 
Kailath. 


Specifically, they begin by taking a^, the first 
estimate of cp^. as 0.5. The receiver feeds back this 
estimate without error to the transmitter, which then 
generates an error signal^. such that 
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*X = '^ a l ** < Pj E ) “ 0(0*5 - cp to .), p> 0 

The signal ^ is then transmitted and observed at the 
receiver as 

' Y 1 = ^1 + N l = P(°*5 - cpj -3- if 


The receiver now computes the second estimate 


-1 


' a 2 “ a l " -V Y 1 


where £ is a constant that is chosen to minimize the vari- 
ance of the estimates a n « From' section 4.2 equations 
(37) and (38), this minimum is achieved by choosing = f3 
Therefore, a 2 = a ± - •|-Y 1 which is now re-transmitted to 
the transmitter,' where the correction is made and sent as 
A, = P ( a 2 ~ Vjj.) • - Again the signal received is > 

Y 2 + W 2. s ^( a 2 “ <Pv) + No , 


ahdciheo.re'ceivero computes • 

“3 8 a 2 ‘ 2 Y 2 
In general, then, one receives 


and computes.-' 


Y •= x) •+ K 
n • n n 


Vi '.vlv 


(64) 
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The estimate a n+1 is then sent back to the transmitter 
which will transmit 


^n+1 s ^n+l “ ¥ fc ) 

This coding scheme is diagramed in Pig. 3 . Note that 
equation (64) is Just a R-M algorithm with 


M(a) « p(a - cp k ) and Y n (cx) = M(a) + N n 


t 

Therefore, we know a n+1 converges to cp k in mean square and 
with probability one. 

Without going into further detail, the results 
of this coding scheme will be summarized; 

1. . For any rate R less than C, 


P~ 2 erfc 

c: 


?/3e( C - R > T , 

1 e 1.577 j 


2. This coding scheme achieves a given P for 

e 

a rate R with a transmission time T approximately one- 
tenth as long as required for the same P^ and rate R with 
orthogonal coding and no feedback. 

3* If delay t in the feedback path is 
included, the performance deteriorates negligibly so 
long as t « T. 

In the previous application, the channel was 
assumed to have no bandwidth constraint. For the same 
problem, except where the channel is bandlimited. 
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.Schalkwijk and Kailath’s R-M coding scheme gave the 
first deterministic procedure to achieve rates up to 
channel capacity . 

8.2 Filtering and Predictio 

The filtering and prediction problem is essen- 

\ 

tially one in system optimization. Here the attention • 
will be primarily devoted to filtering. This problem 
reduces to 'finding a matched filter for a noise corrupted 
deterministic signal and a Wiener "filter for a Gaussian 
signal in a noisy Gaussian background. The foundation 
for this application was -laid in section 3*2, where it 
was noted that the only restriction on the loss .function 
£(•) was that it.be strictly convex. For simplicity, the 
old standby error squared criteria will be used here, 

A(e) = e (t,k) . 

'^The parametric filter form, is • shown in Fig. 4, 

where ' 

t 

Fiit) = j* h ± (iO x (t-^dV, ‘ i = 1, m 

x o x 

are fixed optimum filters- for a given set of m different 
.conditions on the signal and noise. The goal 'is to 
recursively adjust, the variable parameter set k_. as some 
environmental or system condition 'changes, say, the 
noise power level or noise distribution function, so that 

L(k)- = EU[e(t,k) ]} =E{e 2 (t,k)} 






6o 

is minimized. Thus we seek the solution to L(k) = o, 
but this is impossible since the distribution function of 
the error is not^assumedbtosbewknowffvietjsiLngcStoo&astic 
Approximation, we iteratively solve 4[e(t,k)], but 
this requires the availability of S(t). However, the 
problem can be simplified so that it is not necessary .to 
observe S(t) to select the optimum k as is indicated in’ 
Pig. 4. Assuming the signal and noise are uncorrelated, 

L(k) = E{[S -S f) .= E{[x - % - N fl 

= E{[x - S] 2 }. - 2E{N(S + N)} + 2E{N £} + EfR 2 } 

= E{[x.- k T P] 2 } - EfH 2 } +“2k T /WoR^Odr 
Therefore, 

7 k 1 -(is) = E{[x -- k T P] 2 +.2 A(T)R M (T)dT 
However, - 

r T 2 

Vfc E{[x - k P] } still can not be computed because 
the probability distribution' function of x(t) is not 
assumed to be known, even though the filters F (t) were' 
designed for Gaussian noise. But the reason for having a 
parameter vector to adjust is because N(t) may not be 
Gaussian. Regardless, we do not assume a knowledge of 
the structure of N(t). Therefore, to solve 

7 k E ^ X "- t. 2 'A(T)E HN (T)dT '= 0 
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we must iteratively solve 

V k Cx - k T F] 2 + 2 X?h(T)R ra ( T )dT - 0 
or . 

' - Cx - + ; MOBmjOOar- = o' (65) 

t 

where the autocorrelation function ^(r) of the noise 
is assumed to be determinable. The R-M algorithm for 
finding the optimum k is’ then 


£n+l- “ + a n &*(*> - k®?(t) ] F(t) + K} (66) 


where * • 

S = / V t ) h nh(’ 1 )< 5t 


• In the case of detecting deterministic signals, 
the matched 'filter h^.(t) is approximated by a-linear 
combination of known functions cp^(t) 


h 3<*) = A k i.1 M*) 


where the subscript j corresponds to the filter matched 

to the jthi deterministic signal. An analysis similar to 

that above then gives the optimum k.. 

0 

By using the continuous Stochastic Approximation 
differential equation corresponding to the difference • 
equation (66), the optimization of k may be simply imple- 
mented on an analog computer. 
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6.3 Estimating Probability D6nsj-~bi.es and Cor relation 
Functions^ 0 - 

The estimation of an unknown function y = f(x) 
from a finite number of randomly observed points of the 
input data x(t) which may also -be noise corrupted can be 
solved using Stochastic Approximation by assuming that 
f(x) may be represented or approximated by a sum of. arbi- 
trary independent functions cp i (x), so that 

f (x) = k T £(x) *« .2 k i cp i (x) ( 6 7) 

where k is our variable parameter vector. For simplicity, 
let the cp i (x) be orthonormal and choose k to minimize 

L(k) - J Cf(x) - k\(x)] 2 dx 
x 

by again solving 

V k L(k) = 2/ [f(x) k T c£(x)]^(x) dx '= 0 

X 

- 2j f (x)c£(x)dx - 2k = 0 

x 

because the cp i (x) are ofthonorinal. Therefore, .L(k) is 
minimized at 

k = J* f (x)cg_(x)dx » E{£(x) 3 
~ . x 

but f(x) is unknown, so use the Stochastic Approximation 
algorithm to solve 

A 

E&(x) - k} ~ 0 



63 


The necessary recursive relation is simply 


Sn+l “ in + a n ‘ V. 


or its 'continuous analogue 


= a(t) CcaCx(t)] - k(t)] 


(69) 


To estimate a correlation function R(t) = E{x(t-Pr)x(t)}. 

of the random process, x(t) , t tthen f(x(t)} is unknown,, one 
applies the algorithm v 

R nil^ = + a h Cx(n+r')x(n)* - B^t)] 


dR. (t ) 

— — = a(t) .Ex(t+r)x(t) - R t (r)] 


(70)' 


6.4 Identification 

There are many examples where Stochastic Approxi- \ 

• 2 6 3' 1 

mation methods can be applied to system identification. 1 ■ 
Here the elementary’ case of identifying a causal time 

invariant discrete system will be considered.' If the . 

i ■ , * « 

input is applied at' n = 0, the output x'(n)’may be written 
using the- convolution summation as 


— in 

x(m) « 2 k.u(m-i) « k u 


i—o 1 > 


The identification. -procedure consists of determining the 

T 

weighting sequence k n , , ,.,.k w denoted by k_ by observing 
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the output x(t) which may be noise corrupted and mini- 
mizing some convex error criteria L(k) = E{X(e(t))}. 
Invoking the methods of Stochastic Approximation, one 
obtains 

k n+ i - k n + a n X»Cx[nl - k£ u[n]] u [n] 

' » 
and we know 

m 

X . i . m k = k — C k Q , . • * , k^ 3 
nr*°° lJ 


■30 

6 , 5 Dual Control 

This is 'one of the most difficult problems in con- 

31 

trol theory and was essentially defined ,by Fel’baum 
using the decision theory approach. The goal is to con- 
trol a. plant with unknown parameters and external distur- 
bances. 'Fel’baum’ 1 s approach is almost impossible to apply, 
even if the : a’ priori distribution of the plant parameter 
and' the external influences are given, except in simple 
cases , 

A more general approach that is somewhat less com- 
plicated than Fel’baum’ s and requires less a priori knowl- 
edge is a Stochastic' Approximation formulation. 

Given the. linear discrete system 


M N - 

x(n) = £ C.x(n-i) + 2 d.u(.n-i) 

i=l *■“ i=l 1 


a C T x + d® u 
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where _C and _d are unknown. 

Define k = (C^ ..., c^, d^ ..., d R) 

and Z(n) = [x(n-l), ..., x(n-M) , u(n-l)., . . . , 'u(n-N) ] 
Therefore, x(n) = k^Z(n) and choose the loss function 
^ ( * ) to be a convex function 

'4 = 4[x(n) - k T Z(n) ] 

We want to find the solution to v k E{£} =0 by iteratively 
solving v k &(• )' = 0. 

Using the R-M method, the convergent identification algo- 
rithm is. 

\ = 4-l' + . a n V C *( n+1 > Z(n)3 

\ 

■ — n-1 a n ^' ,[x (n+l) - £ T (n-l) Z(n) ] Z(n) ' (71) ‘ 

where a 1 denotes the derivative of SL with respect to its 
argument . 

The controller is designed to generate a control 
law of the form 

' ' r p 

u(n) =*£ T m[x(n)] = 2 8. m. [x(n) ] 

i-1 •=-, , 

where the m^ are linearly independent functions. The con- 
trol performance index I(k Qpt , £) is 

= EtJCx = k^ pt Z, u(x, £-)]} 

where J is a convex loss function., Row we wish to find the 
•2-opt that sol ves. VgI(k opt , 1) « 0 using _ only knowledge of 
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V g J[k^ pt z, u (k^ pt Z, £.)], but k opt is not known so the 

• — ' m * ' 

best one can do is use Vg J[k n-1 . Z.( n ) > u ( k n _i^ JL) 3 
or equivalently 

V 3 J ^-n-l •^( n ) > ^ T H^( n )33 

Thus the algorithm for finding the optimum J3_ is 

In " in-1 + V [ - T - (n) ' £ n-l 2^-1 Z( n ) D 

= £ n _ 1 + b n V p J{x(n-l) + u(n-l)} (72) 

which gives the convergent control algorithm. Note that 
the equation of identification (71) and that of control 
(72) are interdependent. Their block diagram representa- 
tion is shown in Pig. 5 • Analogous continuous algorithms 
can easily be derived for analog simulation. 
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80 

6 . 6 Controllable Parameters - ^ 

A common problem in control systems and in mass 
production of, say, missile components is to adjust a set 
of controllable parameters k to minimize the influence 
of uncontrollable changes in a set of parameters c_ on 
desired system performance. For example, c_ may be the 
pole and zero locations or gain and k may be the state 
variable feedback coefficients. Or c_ may be the mean 
values and variances of a set of variables and k_ may be 
the adjustable means and variances of a set of control- 
lable parameters . 

Thus, we define a performance criteria l (c ,k) 
where the variations in _c may be random, but stationary. 
The attempt is to find the value of k that minimizes 


E{r(£',k)} = J* I(£,k)dF(cJ = J (k) 
c 


This problem can be solved in general even if F(_£) is 

unknown by applying Stochastic Approximation, obtaining 

the algorithm, where c n _x is obtained by continuously monitoring 


xt . 


k n = £n-x + a n v k I[ ^ 


-n-1;— n-1 J 


( 73 ) 


o o op 

6 . 7 Allocation of Limited Resources^ J 

This last application deals with an Operation 
Research problem in reliability or allocation of limited 


resources . 


It is desired to find the optimum method 
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p(x) = k T <P.(x) of distributing a limited quantity of 
.resources x in which we wish to maximize the expected 
gain G 

G = E{g[p(x),x]} 

under the constraint on the resources 

J* W(x)p(x)dx = c 
x 

where ¥(x) is a weighting function, say, 1 in this exam- 
ple . 

When the probability density function f (g) or 
equivalently f(x) is not known, it is common practice to 
seek a minimax solution. By applying Stochastic Approxi- 
mation, we can avoid this conservative approach. 

It is first necessary to guarantee that the con- 
straint 

Jp(x)dx - 6 = k T /<£(x)dx - c 
x x 

T 

= k B - c = 0 

is satisfied. This is easily accomplished by using 
Lagrange multipliers, giving 

I = G(k) + X(k T B - c) 

and seeking the solution to 


V -° 
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by recursively computing 

— n = 4-1 + a n Svi £ 8[k^. 1 t(>(x n ),x n ] + X.^} (74) 

X n “ x n-l + V^-V' 0 ) - (75) 

or using their continuous counterparts 

dk 

dt = a ( t )^ v kS[k T (t)cp(x(t)),x(t) ] + X (t )b} 

|| = b(t)[k T (t)B-o] 

The block diagram for equations (74) and (75) is shown in 
Fig. 6. The unusual characteristic of the schematic is 
that it is in essence a perceptron, a device originally 
devised by Rosenblatt^ in his work on artificial intel- 

■ 34 

ngence. Here., however, Rosenblatt's threshold functions 
have been replaced with the linearly independent func- 
tions CfK 

In concluding this chapter, it should be noted 
that the techniques discussed in Chapter IV on acceler- 
ating Stochastic Approximation schemes may be used in all 
the applications considered. 
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Circuit Realizing Equations (74) and (75) 

Pig. 6- 





CONCLUSION 


This paper has been in essence an attempt to deal 
with many topics in optimization theory from an algo- 
rithmic viewpoint suitable for computer solution. Such an 
approach is especially useful in complicated engineering 
systems where the only analytically feasible solution 
requires simplifications that make the results meaning- 
less . 

It should be pointed out that many other research 

topics which are appropriate for Stochastic Approximation 

methods have not been presented. Some of these subjects 

are pattern recognition, random- rounding computer errors, 

quantal response in biological systems, learning control 

systems, inertial and non- inertial non-linear system 

identification and control, process control, estimation 

in radar and radio astronomy, trainable threshold logic 

and probabilistic automata. In addition, the Stochastic 

Approximation algorithms considered contain the Potential 

Function method of Aizerman, Braverman, and- Rozonoer as 

37 

a special case. 

In closing, areas of future research will be 
cited. A few of these are: development of a 
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.(1) Stochastic Approximation Newton-Rapson 
Method 

(2) Stochastic Approximation Conjugate Gradient 
Method 

(3) and extension of Stochastic Approximation 
Methods to function spaces as has been done 
for steepest ascent methods. 

A forthcoming paper on self-adaptive filtering and pre- 
diction will describe original results which are a direct 
consequence of this study. 
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ABSTRACT 


In this report the problem of self-adaptive optimal 
estimation of a sampled stochastic signal observed in random noise 
is formulated and an engineering solution is jleyeloped . Chapter I 
introduces the topic and reviews the results of recent research. 
Chapter II gives the necessary background material from estimation 
theory. Chapter III develops the learning criterion and derives the 
adaptive stochastic algorithms from it. The learning criterion is 
based on the principle of orthogonality of Chapter II. Chapter ‘IV 
presents the experimental results obtained by applying the learning 
criterion and associated algorithms to specific systems. 
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CHAPTER I 


INTRODUCTION 
lvl Prologue 

Recently considerable attention has been directed toward self.- 
adaptive (p r self -learning) optimum systems.. The basic idea is quite 
simple: one wishes to design a system to perform efficiently in an 

unknown or changing environment without the necessity of direct human 
intervention. Such systems are extremely important in the context of 
control and communication theory where it is often impractical or impossible 
to obtain the a priori information required to specify the optimum system. 

In this report the goal is to provide a self-adaptive solution to 
the problem of optimal filtering, prediction, and detection of stochastic 
signals imbedded in random noise. However, before discussing the prin- 
cipal results, a historical survey of this topic is appropriate. 

1.2 Background and Historical Survey 

One of the most important topics in control- theory is the stochastic 

control problem. Here one is required to determine - the optimum controller 

for a given plant without precise knowledge of the state x(t) of the 

\ 

plant. The stochastic approach to optimum control is motivated by the 
fact that in general - ' • 

1) Some of the state variables are not available for measurement, 

2) the measurements contain noise, 

3) the plant is subject to random input disturbances. 
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By using the state transition representation, a linear dynamic system 
model of the plant can be described by 

x(n+l) = $x(n) + Dy (n) + w(n) ( 1 . 1 ) 

and the measurements of the state x(n) by 

y(n) = Hx(n) + v(n) (1.2) 

where 

x(n) is the system m x 1 state vector 
p (n) is the Z x 1 control vector ‘ 

w(n) is the m x 1 white perturbation noise input vector 
• $ is the one-step m x m state transition matrix • - 
. D is the m x Z control matrix 
■y(n) is the p x 1 measurement vector 
v(n) is the p x 1 white measurement noise vector 
H is the p-x m observation matrix. 

The approach (Lee, 1964) generally used to attack this problem is to 

‘ ' A 

first estimate the state x(n) . Then this estimate x(n) is used as if it 
were the actual state to. calculate the optimum control employing determin- . 
istic methods such as the maximum principle. In other words the stochastic 
control problem is separated into two phases,- referred to as estimation and 
control. It has been proved that for linear systems with a quadratic per- 
formance index and subjected to white Gaussian noise inputs, the optimal 

, % i 

stochastic controller consists of an optimal estimator (filter) in cascade 
with an optimal deterministic controller (Joseph and Tou, 1961) . This result 
is known as the Separation Theorem. In this thesis, -only the estimation 
phase is considered because the deterministic control solution is well known 
(Shultz and Melsa, 1967 or Sage, 1968), In communication theory an equally 
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important topic is the stochastic detection problem. Here one is required to 

determine the optimal receiver for detecting the presence of a stochastic signal 
! 

£(t) imbedded in additive random noise v(t) . Assuming the signal x(t) has a 
rational spectrum, it is possible to represent it as the state of a linear 

s. 

dynamic system with a white noise input (Kalman, I960): The linear dynamic 

system is called the signal generating process, and in state transition repre- 
sentation is described by 

x(n+l) = $x(n) + w(n) (1.3) 

and the measurements of the signal x(n) by 

y(n) = Hx(n) + v(n) (1.4) 

rhe notation is the same as that of equations (1.1) and (1.2). The signal 
generating process (1.3) is identical to the control plant process (1.1), 
except for the control input y (n) . However, the Separation Theorem states 
the control term can be disregarded in the estimation phase. Therefore’, the _ 
estimation problem and its solution are identical for both control and com- 
munication theory. 

Also,- there exists an analogous Separation Theorem solution to the stochastic 
detection problem (Kailath, 1963) which states that for a Gaussian signal with 
rational spectrum observed in white additive Gaussian noise, the optimal sto- 
chastic detector consists of an optimal estimator (filter) in cascade with 
the optimal detector for a deterministic signal, i.e., the output of the 
filter is considered to be the actual signal. Again, only the estimation 
phase is considered since the deterministic detection solution is well known 
(Hancock and Wintz, 1966, or Van Trees, 1968). . 
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Because of the identical mathematical framework of estimation in a 
control or communication context, no distinction between the two areas is 
made in the text that follows. 

Wiener (1949) and Kolmogorov (1941) are credited with the solution for 
a single input-single output system. Wiener formulated the problem in terms 
of finding the optimum (in a minimum mean-square error sense) linear filter. 

He showed that a necessary and sufficient condition for optimality was that 
the filter - satisfy the Wiener-Hopf equation, and developed a method (spectral- 
factorization) for solving this equation for signals with a known stationary 
rational spectrum and for noise with a known stationary white spectrum. 

Following Wiener's pioneering work, there developed an extensive liter- 
ature which interpreted, simplified, modified, and extended his results. 
Detailed bibliographies may be found in Stumper (1955) and Balakrishman 
(1963) . 

However, the case of a non-stationary multidimensional signal -in non- 
stationary multidimensional noise remained unsolved in an engineering sense 
until 1960-1961 when Kalman (1960) and Kalman and Bucy (1961) published their 
fundamental papers. Instead of seeking a solution to the Wiener-Hopf equation ■ 
in the frequency domain with the attendant problem of spectral factorization, 
Kalman combined the concept of state variable representation of dynamic systems 
with the orthogonal projection in a Hilbert space representation of linear 
filtering to obtain a direct solution in the time domain. In contrast to the 
Wiener's method, Kalman's results are in recursive form and therefore ideally 
suited to real-time sequential digital computation. However, both the Wiener 
and Kalman theories require complete Knowledge of the message generating and 
observation noise covariance matrices, denoted by Q and R respectively. 
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In the real world such extensive a priori information is generally 
not available. The consequence of not knowing R and/or Q is a suboptimal 
filter, i.e., an increase in the error covariance matrix. In some cases 
the increase is unbounded (Sorenson 1966) . Detailed investigations of the 
suboptimal performance caused by insufficient a priori information have been 
widely reported in the literature, e.g., Soong (1965) , 'Heffes (1966), and ■ 
Nishimura (1966, 1967). In addition, the inverse of R must exist to perform 
the Kalman filter computations. The presence of either noiseless measurements 
or correlated observation noise can render R singular. In practice R is 
often ill -conditional simply because one measurement is an order of magnitude 
more accurate than the others. Thus the Kalman filter formulation can gener- 
ate application difficulties. Bryson and Johansen (1965) and Bryson and 
Mehra (1968) have modified the Kalman framework to handle this particular ' 
problem, but their technique necessitates state augmentation which increases 
the dimension of the filter and the computation time'. 

1.3 Statement of the Problem and Previous Results 

The inadequacy or absence of a priori knowledge leads naturally to the 
consideration of adaptive or learning approaches to optimum estimation. 
Specifically, a self-adaptive solution to the sampled data, stationary op- 
timum filtering and prediction problem is sought which does not require a 
priori specification of R and Q and retains the recursive features .of Kalman’s 
formulation. 

Previous adaptive techniques ‘can 1 be divided into, two types . The first 
due to Magill (1965) -assumes that the parameters of R and Q belong to a finite 
ensemble of a priori known possibilities. An optimum Bayesian pattern recog- 
nition algorithm for Gaussian distributions is used to learn which sampled 
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data process is being observed. With this knowledge, Q and R are uniquely 
specified. Magill's method is valid only for a scalar observation process 
and is cumbersome to apply. For example, given N unknown elements of Q with 
the single unknown element R, and M possible values for each variance, there 
are (N+l) combinations. Each combination requires the implementation of 
the corresponding Kalman filter equations. Hilborn and Lainiotis (1969) 
extended Magill's technique to a vector observation process and prove mean 
square and probability one convergence. 

The second approach is to estimate directly the components of R and Q. 
Shellenbarger (1966) showed -how to use the likelihood principle to ac- 
complish uhis estimation under the assumption of Gaussian distributions 
and other more restrictive requirements which limit its utility. As a 
result., Shellenbarger (1967) developed a more general least-squares learning 
method to determine R and Q . Proof of convergence is not considered. It is 
important, to note that both of these approaches require the determination of 
both the R and Q matrices, and the existence of the inverse of the estimated 
R matrix. Then the entire set of Kalman's equations must be solved for the 
estimated optimum filter each time the estimates of R and Q are updated. 

v. 

1.4 Approach to the Problem ' 

- ' ‘In this 'report,' an unsupervised learning criterion is formulated 
from which self-adaptive algorithms are derived. These algorithms learn 
the optimum discrete time stationary Kalman filter directly. This elimi-' 
nates both the necessity of estimating R and Q as an intermediate step and 
the need to solve the entire set of filtering equations. The number of 
parameters to be determined and the computation time is also reduced. In 
addition, the problem associated with the existence and computation of 



R is avoided. Satisfaction of' the learning criterion is shown to be a 
necessary and sufficient condition for optimal filtering. The stochastic 
algorithms developed for estimating the optimum filter converge in a mean- 
square and with probability one. The results are valid for scalar and 
Vector valued signal and noise processes. 

1.5 Organization of the Report 

. t 

The second chapter presents a comparison of Wiener and Kalman filter 
theory which serves also to introduce the notation to be used. The' review 
of Kalman's theory lays the foundation for the motivation of the learning 
criterion. 

Chapter III formulates the learning criterion and proves its necessity 
and sufficiency for optimum filtering. The stochastic algorithms required 
for performing the adaptation indicated by the learning criterion are then 
presented. The theory of Stochastic Approximation is invoked-to prove the 
convergence of the algorithms. An extension to time-varying signal and noise 
statistics is suggested. 

Chapter IV applies the theory of Chapter III to specific problems and 
presents the results of simulations which illustrate the success of this 
self-adaptive method for (1) different initial values of the filter matrix 
with R and Q held constant and (2) different values of R and Q with the 
initial choice of the filter matrix fixed. 

Chapter V contains conclusions along with recommendations for further 
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CHAPTER II 


OPTIMUM FILTERING 

2.1 Introduction and Organisation 

The objective of this chapter is to present several of the more 
important results from the theory of optimal estimation. The application 
of these results to the engineering problem of extracting a stochastic 
signal from noisy observations or estimating the state of a control system 
leads to the Wiener and Kalman theories which are developed and compared. 

At this point it is necessary to specify exactly what is meant 
by filtering, and prediction, of a stochastic signal x(t) observed in 
additive noise vCt) ■ 

Definition: Observe the sum z(nT) = x(nT) of the two random processes 

x(t) and v(t), representing the signal and noise respectively, 

- over the discrete' time interval ((n-m)T, nT) , n > m. Filtering 
is the estimate of x(~) at t = nT. 

► 

Prediction is the estimate of x(x) .for t >. nT. 

Both cases will be dealt with in the succeeding pages, but the 
greatest emphasis is placed on filtering because it is the key operation. 
Note rhat even though x(t) and v(t) may be continuous functions of time, 
the data z(nT) is observed only at discrete times. That is, in this 
thesis only sampled data is considered. 
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2.2 Optimal Estimation: Bayesian Approach 


To discuss optimality, a criterion of optimality must be defined. 
Suppose that a random variable x is to be estimated from the set of data 
Z = {z(l), ..., z (n) } . Then x will be called rhe optimal estimate of x 
given Z if and only if the average loss 

E {£ (x~x) } E z {E^ £(x-x)j Z } = E z (L( X |Z)} ( 2.1 ) 

is a minimum, where £(x-x) is an appropriately defined loss function. 

In equation (2.1) the expectation with respect to Z is not dependent 

A 

upon.x; therefore, it suffices to choose x such that 

L(x|Z) = E {£Cx-x)|Z} (2,2) 

. & 

is minimized. A solution based on minimizing (2.1) or, equivalently, 

(2.2) is called a Bayes estimator. It has been shown (Sherman, 19S5, 

1958) that for a rather general class of loss functions $,(.) and a 
posterior densities that the Bayesian estimator is ; .the conditional 

x = E {x| Z} (2.3) 

THEOREM 2-1. Let S = {£('): £ is symmetric and convex}. If the a 

posteriori density p(x|Z) is symmetric about its conditional mean 
E (x Z}, then the conditional mean E {xjz}is the optimum estimator 
of x given Z in the sense that it minimizes .(2.2) for all £ e S. 
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proof: 


£(e) = Z(-e) 

symmetry 

(a) 

A(ae + be ? ) <_ a z(e ) + b t(e 2 ) -V e ± > e 2 

convexity 

(b) 

where a + b = 1, a e (0,1) 



and p(yjz) = p(-y|z) 

symmetry 

(c) 


where y = x - E{x[Z} 
l/(x|Z) ; = E U(x-x)|Z} 


* E x {£(x-x)|Z} 

by (a) 

= Ey. {£ (x - E xjz - y) ! Z) 


* E y U(x - -E xjZ + y)]Z} 

by (c) 

= Ey a(E xjz -x - y[Z} 

by (a) 

= Ey U(E xjZ -.x *y) jZ 

by (c) 

= E ■ U(y -{ E xjz - x})jz} 



- 4' E {^Cy +{ E xjz - x} + l(y 


-{E xjZ -x})jz> 

by (a) 

>2E{5,(i y + {E x|Z - x} + 1 y 


- (E x|Z - x) )JZ} 

by (b) 

= E{£(y)jZ} with equality iff x = E{x|Z} 

Q.E.D. 


The class S can. be greatly extended if we add two restrictions to the • 
conditional density. 

THEOREM 2-2. Let S = {£(*): & is symmetric and &(e^) >_ £(e 2 ) > 0 for 
e 2 ~ e l ~ &(0) = 0}. If the a posteriori density p(xjZ) is 
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(1) symmetric and monotone nonincreasing about its 
conditional mean. 

(2) decreasing rapidly enough so that lim £(y) p(y[Z) = 0, 
where y = x - E{ xjz} 

then- E{ xjz) is the optimum Bayes estimate, 
proof: see Viterbi (1966). 

Some examples of the £(' ) e S arc 

ft, le [> k 
*(e) =< 

(0, [e [> k 
&01 .= K [ e I 
&C e l = JC [l exp C-e 2 }] ' 

Note that under the conditions of Theorem 2 the conditional mean E{ xjz} 
coincides with the maximum a posteriori estimate. 

In general What follows will concern vector-valued random 
processes x(n) . Equation 2.2 then becomes 

L(x(n)[ Z(n)) = E x U |x(n) -x(n)|J‘ Z(n)} (2.4) 

where Z(n) = (z(l), . .., z(n)) and || ' | j denotes the norm of the 
yector. Theorems 2-1 and 2-2 extend readily to include this case 
(Kalman, 1960) . 

If the error squared is chosen as the loss function, then 
restrictions (1) and (2) of Theorem 2-2 on the a posteriori density are 
unnecessary. 
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THEOREM 2-3. Let Z jj | x(n) - x(n) | | = j| x(n) - x(n)jj 2 = j?c(n) " 2( n )J 

jx(n) - x(n)j , then E (x(n) jZ(n)} minimizes (2.4) without any 
restraints on p(x(n) [Z(n)). 

Proof : 

<* T * __ * ^ 

E {£* - x~J p- - xjjz } = x x. - 2x E{X | Z> -5- E{x T x jZ} 

= (x - E* {x jz> ) T (x - E (x j Z) ) + E {x T x| Z }- £ E{ x|zfJ T E{x|Z} 
>_ e'{x T x I Z> - jE {x IZ>J T E(x] z) ■ 

A 

with equality iff x_(n) = E{ x(n) j Z(n)} • Q. E. D. 


2.3 - Principle of Orthogonality and the Wiener-Hopf Equation 

The contents of theorems 2-1,- 2-2, 'and 2-3 give the "in princi- 
pal" solution of the Bayes estimation problem for a wide class of loss 
functions and probability structures. However, the explicit computation, 
of this optimum estimate E{x(n) Z(n)} is formidable except in the 
important case when {x(n)> and {z(n)} are Gaussian. Here we have the 
well known result that E{ x(n) |Z(n)} is a linear function T £z(n)J of the 
observations z(*)> e.g., see Deutsch (1965). ’The optimal linear operator 
•T Jj(n)j can be determined using the orthogonal projection theorem 

THEOREM 2-4. Let {x(n}} and {z(n)} be zero mean .random sequences. Let 
Z(n) represent the closed linear manifold 1 generated by the data 
(B z(i) — , z(n) } where B is the general m x p generatof matrix 

fnr 7. fn'l 


- 12 .- 



If either 


(i) the random sequences {x(n)}, {z(n)} are gaussian or 

A 

(ii) the estimator x(n) is required to be a linear function 

T Oil o£ data {z(i), .... z(n)} and i J [x(n) - x(n) j j 

2 A 
I l*( n ) ~ *( n ) J ! • Then the optimal estimate x(n) of xfn) 

is such that the error e(n) - x(n) - x(n) is orthogonal to 

Z(n), I.e. , 

(x(n) - x(n), Bz(j)) = E{ x(n) - x(n)' T Bz (j)} 

= 0 Bz(j) e Z(n) ' ( 2 

where (*,’) is the inner product induced on Z by E{(') T (*)} • ' 
Proof: see Kalman (I960) 


COROLLARY 2-5 (x(n) - x(n), x(n)) = E‘ { j^x(n) - x(n)j ’ x(n) y = 0 

where x(n) = T [z(njj 

Under condition (i) of the theorem, the orthogonal projection of x'(n) 
on Z(n) is identical to the conditional mean E{x(n) j Z(n)}. Thus, this 
theorem implies that the optimum linear estimator can not be improved 
upon unless the random phenomenon are non-Gaussian and, even then, 
only by assuming knowledge of at least third order probability distri- 
but ion inunctions . Consequently we know the general form of x(n) in 

the sampled-data case is 



Given any random 
sequence with the 


sequence, there exists a unique Gaussian random 
same mean and covariance. 


( 2 . 6 ) 
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where A(n,v) is an m x p filter matrix. If the data acquisition rate 
is high enough to be considered continuous, then (2.6) becomes an 
integral equation. Regardless, A(n,v) is chosen to satisfy the 
orthogonal projection theorem. This is the method employed by Kalman 
(1960) to solve the optimal filter problem. 

Kalman and Bucy (1961) used this theorem, i.e., the orthogo- 
nality of e(n) and Z(n) , to derive the multidimensional Wiener-Hopf 
equation. The Wiener-Hopf equation is given by the outer product 
- ?(n) ) (z(j) = E{£x(n) - x(nj] z T (j)} 

= [°] ; *V5(l) £ Z(n) (2.7) 


Since x(n) is given by (2.6), (2.7) can be written 

H{ fx(n) - E A(n,v)l z(v). z ? (j)} = E{x(n) z T (j)} 

u ~ \>=i - 

n 

- Z A(n,v) E z( v ) z (j)} = fifl , J /z(j) e Z(n) 

v=i ~ ' — 1 


( 2 . 8 ) 


If i -co in (2.8), the sum is assumed to be uniformly convergent so 
the order of summation and integration may be- interchanged. For the 
scalar case it is obvious that the result given by orthogonal 
projection theorem equation (2.5) and the Wiener-Hopf equation (2.7) 
are identical. To show this equivalence in the random vector case, a 
dual space approach was used. The result is summarized in Theorem 2-6 
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THEOREM 2-6: A necessary and sufficient condition for 

E{ [x(n) - x(n)] z T (j)> = [o] ^z(j) e Z(n) (2.7) 

~ n 

where x(n) = z A(n,v) z(v) 
v = i 

is that A(n_,v) be chosen such that 

E (jx(n) - x(n) T Bz(j)> = 0 ^Bz(j) e Z^Cn) (2.5) 

Proof: see Kalman (1960) and Kalman and Bucy (1961) 

COROLLARY 2-7: E{^x(n) - x(n) x T (n) } =£oJ 

This theorem and the accompany corollary provides a common framework for 
rhe filter theory of Wiener and Kalman. 

2.4 Wiener and Kalman Filter Theory 

The purpose of this section is to present the results of Wiener 
and Kalman for comparison. The reader is referred to the appropriate 
references in section 1.2 for a' complete derivation. The model used 
is given by equations (1.3) and (1.4) 

x(n> 1) = <*> x(n) + w(n) (1-3) 

z(n) = H x(n) + v(n) 

with E '{w(n) w T (n) } = 0(h) ( 1 ^ 4 ) 

E (v(n) v T (n)} = R( n ) 
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Wiener’s approach uses the frequency domain and the solution is 
given in terms of the Z-transform A(Z) of the m x p filter matrix 
A(n- ) as illustrated in FIGURE 1. The problem of synthesizing the 
filter remains. Employment of the frequency domain requires the 
following restriction. 

'(1) The system $ and observation matrix H are time invariant. 

(2) - The statistics of w(n) and v(n) are stationary. 

(3) The data z(n) are known for past time, i.e., i = in 
equation (2.8).- 

Under these conditions, equations ( 2.6 ) and (2.9) become 

- n 

x(n) = E A(n-v) z(v) (2.9) 

v= -» 


and 


3 

To the author’s knowledge, the first -general technique for 

determining the optimum multiple input multiple output discrete 

filter using Wiener 1 s method was given by Motyka and Cadzow (1967). 

4 

Since only sampled data is. considered, frequency domain means 
the Z-transform domain. 
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and 

T n T 

E { ? (n) l } ■“ z A (*-v) E (z(v) z (j) } 

V = -CO ~ - 

n 

= R „C»"« ' z A ( n -v) Rzz(v-j) 

v = 



Je 


n 


( 2 . 10 ) 


By a change of variable (2.10) can be rewritten 

» CO 

R (a)..- • I, A( t ) R ( a _ x ) -f 0 

xz ■ T = o zz *- 

ihe cross- spectral (generating function) matrix representation of 
(2.11) is 

'^X2 CZ) “* zz (Z) A CZ) = [°] ' ' (2.12) 

Since each element of (2.11) is zero, each element of (2.12) is a 
polynomial in positive powers of Z only. Thus each polynomial element . 
must converge for all Z inside the unit circle. Assuming 4> (Z) has 

ZZ 

a spectral factorization of the form 


j| j ae{0, ...,<»} (2.11) 


* (Z) = A(Z _1 ) A T (Z) 

ZZ 

where a(Z ) is a p x p matrix whose elements represent the Z- trans- 
form of stable, linear, casual systems (i.e., polynomials containing 
a constant and positive powers of Z) and have no poles inside the unit 
circle, then a physically realizable Wiener filter exists. The frequency 
domain expression for this optimum filter is - 
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FIG. 2.! WIENER FILTER MODEL 




T 

A (Z) = 


T 

A (Z) 


1 


-1 


{A 1 (Z~ 1 ) + A _1 (2 _1 ) <j>_(Z)} (2.13) 


xz 


xz 


where { } is a matrix whose elements are constants and { } is a 

• ° + . 

matrix whose elements contain only poles inside the unit circle. The 
"in principle" solution given by (2.13) is not easy to synthesize, 
and is nor suited to machine computation. 

Kalman"s time domain approach not only eliminates these two 
difficulties, but also the three restrictions listed on page 9 . He 
used che orthogonal projection theorem to obtain the following recursive 
set of equations for optimum filtering and prediction: 


x(n) = $ x(n-l) + K(n) 


z(n) - H $ x(n-l) 


- E {x(n) | Z(n)} for Gaussian noise 

K(n) = S (n) hYV) 

= E(n|ri-1) H T H Z (n|n-l) H T + R(n) 

= Kalman filter matrix 


-1 


(2.14) 


(2.15) 


Z(n) = Cov {x(n) | Z(n)} 


= E { x(n) - x(n) 


x(n) - x(n) 


Z(n)> 


= E (n| n-1) - 2 (n| n-1) ir 


H Z (n[ n-1) H + R( n ) 


= Error Covariance matrix - 


(2.16) 


-1 


H Z (nj n-1) 


Z(n+ljn) = E { x(n-i-l) - $ x(n) 


x(n+l) - ® x(n) 


1 


Z(n)} 


(2.17) 


= $ E(n) t + Q(n-rl) 

= One-step prediction error covariance matrix 
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x(n+l[n) = $ x(n) 


= One-step prediction of x(n+l) 

= E (x(n+l) [ Z(n)} for Gaussian Noise 


The block diagram for the Kalman filter is shown in Figure 2.2. Note 
that it is in the form of a closed-loop feedback system. The necessity 
.of knowing the covariance matrices Q(n) of the white plant perturbation 
(signal generating) noise and R(n) of the white observation noise is 
obvious from inspection of (2.15) - (2.17). • 

When restrictions (1) - (3) required for the Wiener approach are 
satisfied, the Kalman filter is equivalent to the Wiener filter. This 
must be true from Theorem 2-6. However, the computational superiority 
of Kalman's method is still evident. It is interesting to determine 
exactly how the Kalman filter matrix K is related to the optimum 
Wiener filter A(n-v) . For this case 

n 

E A(n-a) z(a) 

a = -<x> 

From Theorem 2-6 


x(n) = $ x(n-I) + K 


z(n) - H ® x(n-l) 


or 


E { 


S { 


x(n) - x(n) 


z (n) } = E { 


x(n) - x(n) 


x T (n) H T + v T (n)]} 


x(n) - x(n) I x T (n) } H T = 


= E{ [x 


Cn) - x(n) 


v T (n)> 


( 2 - 


The left hand side of (2.19) is £(n) H from Corollary 2-7, and the 


right hand side is E {x(n) v (n)} since v(n) is independent of x(n). 
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Therefore 








Z (n). H T = E 


(x(X) v'Cn)} 


n 

E { Z A(n-a) z(a)v (n)} 
a = -“ 


n 


E { Z A(n-a) 
a = -« 


H x(a) + v(a) 


T 

v (n) } 


n rp 

S A(n-cs) E {v(a) v (n) 

a = _co 


= A(0)R 


which implies 


T 

A(0) = E(n) H R -1 


= K 


Thus the impulse response of the optimum Wiener filter matrix evaluated ' 
at rime equal to zero gives the optimum Kalman filter matrix. 


2 . 5 Summary 

This chapter has reviewed some of the fundamental concepts of 
estimation theory and its application. It was shown that for 
Gaussian noise the optimum estimator is linear. For a given system 
this important result yields the filter theory of Wiener and Kalman 
which was reviewed and compared. 
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CHAPTER III 


FORMULATION OF THE LEARNING CRITERION AND 
THE ASSOCIATED STOCHASTIC ALGORITHMS 

3.1 Introduction and Organization of the Chapter 

In Chapter II some of the Important concepts of estimation 
theory were reviewed, and the results of Wiener and Kalman filter 
theory were presented and compared. There it was shown that for 
optimum filtering the estimator must satisfy the Wiener-Hopf equa- 
tion. Thxs equation is also the fundament of the learning criterion 
to be developed in this chapter. Stochastic algorithms, based on ■ 
this criterion, are derived which asymptotically converge to the 
optimum filter. Stochastic Approximation techniques are invoked to 
prove this convergence. 

3.2 The Learning Criterion 

The purpose of the learning criterion is to provide a necessary 
and sufficient condition for an adaptive solution to the optimum 
filter problem when the signal and noise covariance matrices Q and R 
are unknown . In addition, this criterion must have two additional 
characteristics . 

(1) It must be a function of measurable and/or calculable 
quantities. 

(2) It must provide information from which convergent algorithms 
caii be derived . 

Otherwise , the criterion is meaningless from an engineering point of 
view. 
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THEOREM 3-1. 


Given the dynamic system - 
x(n*l) = $ x(n) + w(n) 


the observation process 
z(n) = H x(n) + v(n) 


and the filtering equation 

x(n) =$x(n-l) t Q(n) - H * x(n-l)] 


If K = K. 


opt 


= E H HER 


rp 

|_H E H + R 


-1 


That is, is the optimum Kalman filter matrix. 

Then, 


T 

■ n+i 


E { 5.;., Sj}^ C6 n+1 Sj).= 0 “V>n 


where , 

Si = z(j) - H <1 x(j), and conversely. 


(1-3) 


(1.4) 


(2.14) 


(3.1) 


This theorem is important because it implies that when K is not the ootimum 

n 

filter matrix, the residual process {5^} is not orthogonal. 

THEOREM 3-2. The Learning Criterion 

I£ E (6 n ^ <Sj }= 0 V 5-n and E{Xo) = 0, 

then, 

E {5 n+l si }« [o] Vi-n, 
and conversely. 
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Thus, the lack of orthogonality, when K is not equal to K is 

n ' opt 

reflected in a non-null correlation matrix between the residuals, 

E { 6 n+ l 6jT} ^ C t n+1 -^ H [o] j-n 


C3.2) 


C(n+l.j) could be used as a basis for learning K . if a technioue can 

opt 1 


be devised to utilize this correlation between <$ n+1 and. S n to adjust K 
such that K„ " K , as n . 


n 


n 


out 


Note first, the fact that the correlation between 6 and <$ 

n+1 n 

can be represented by the stationary Markoff- sequence 


6 = P 6 t e 

n+1 n ' n 


(3.3) 


where { e^ } is a zero mean random sequence. Post-multiplying both sides 


of C3.3) by 6 and taking the expected value gives 

TL ~ ° 


E < 5 n*l > = W{V n T > + E ( e »«» T ) 


or 


C(l) = P C(0) + E{ e 6 ?} 
■ n n 


-1 


Choosing the state transition matrix P = C Cl) C'(0) 


forces E-{<$^ sj) = M . P represenrs the correlation between 6 and 


n+1 ' 


If P can be forced to approach oj as n approaches 00 , then from 

theorems 3-1 and 3-2, approaches K . . Thus an algorithm is required 

which uses P to adjust such that P as n Equivalently, 

the adjustment must force 6 -> e as n -> 

J n+1 n 
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3 . 2 Development of the Adaptive Algorithms 


In the derivation of this algorithm the measurement matrix H is 
assumed to be invertible so that P can be written in the form 


P = H $ A K 


n+1 


(3.4) 


where AK ^ is an arbitrary matrix chosen to satisfy equation (3.4). 
Let the ini.tial value of the filter matrix be K °, then 


n 


x^n) = $x,<p.-l) + K °| 

n 


z (n) - H $ x 


x°(n-l)J 


(3.5) 


Rewriting eq (3.3) 


e = S' P 6n 
n n+1 • o 


= z(n+l) - H®x^n-1) - H ® A K° 

n+1 


Substituting (3.5) into (3.6) gives 

e = z(n+l) - H $ { *x^n-l) + K ^ 


z(n) - H § x(n-l) 


(3.6) 


H <p AK 1 • 
n* J- 


z(n) - H 4 x (n-1) 


> 


z(n) - H 4> xtn-l)J 

z(n+l) - H §{§ x(n-l) + (K ° + AK^ 0 ,,) j^z(n) ~ H ® x?n-l) 


n n+1 

= z (n+1) - H x?n-l) + |~z(n) - H $ x 

1 


o 

(n-1) 


} 


= z(n+l) - H ® x(n) 
1 


= 5 


n+1 


(3.7) 


Equation (3.7) implies that P = 0 in the equation 

1 o 

6 = P 6 + e = -e 

n+1 In n n 

and since 



r -o "IT ! 


E {e 

z(n) - H ® x(n-l) | } = 

0 

n 

L J 1 

— - 


•' I 


r - — r 

E {J^z(n+1) - H ® x (h) z(n) - H ® x°(n-l) 


J L. „ 
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(3.S) 



o 


From Theorem 3-1 and 3-2, AK „ is the correction to K reauired to 

n+1 n 

satisfy eqn (3.8). Therefore, under steady-state conditions 


K ° + AK = K 
n n+1 opt 

T T . 

However,' since the E { 6 ,5 } and E{6 <5 } are unknown, AK can not 

n+1 n n n n+1 

be calculated. 

But P and, therefore K „ can be estimated by using the method of ' 
n+1 n+1, 

stochastic approximation (Dvoretzky 1956),. A detailed survey of 
stochastic approximation is contained in Hampton 1969. 

To provide insight into the derivation of the stochastic algorithm, 
the problem of determining P is reformulated in a .performance index 
framework. Let L(P) be the expected value of the performance index 
to be minimized. 


UP) = E U (5 n+1 - « n+1 )} C3.9) 

~ T 

where £(6 -S n+1 ) = ^n^-1 " P ^n+l* P 6 n^ 5 is the P erformance index. 

When t(P) is known (the deterministic case), equation (3.9) can be mini- 
mized by solving 


iteratively 


V L(P ) = 0 
P opt 


P = P + V n L(P ) B 
n+1 n P v n n+1 


under appropriate convergence conditions. 


(3.10) 


(3.11) 
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In die case at hand L(P) is not known. This condition is precisely the 
motivation for stochastic approximation which states that (3.11) may be 
replaced by the random matrix sequence - 


n+ 1 


= A (6 


n 


n+1 


- 6 n+P 


n+1 


(3.12) 


where B == „ ty 

n+1 “n+1 n+1 


^ 3 n^ '*‘ s a sec l uence of real numbers such that 

°° oo 2 

a - 0 , E a =« and E a ' < °° 
n n=o n . n~o n 

and {W^} is a sequence of uniformly bounded linear matrix operators. 
Under the above conditions the random 'sequence generated by (3.12) 
converges to P ^ in mean square and with probability one. 

- ~ ^ 

Choosing a = 1/n and remembering P = H $ A K , then (3.12) 
n n n 

becomes 


Pi 1 = P + {(6 ° - P S °) <5 ° T ) 

n-1 n n+1 n n J n 


W 


n+1 

n+1 


(3.13) 


P + { 
n 


z(n+l) - H $ x(n) 


- P 


n 


z(n) - H $ x(n-l) 


> <S° T W 
n ti+1 


n+1 


= P n + { z (n+1) - H 


= P n + (z(n+l) - H $ 


= P n +’ (z(n+l) - H $ x(n) } 


= H * AK n + 6 n+ £ (6°) X Wn+1 


$ x(n) +A K (z - H $x(n-~I)) 
n n / 

^ ^ ^ q 

$ x(n-l) + (K + AK n ) (z(n) - H $ x(n-l)J 


>6° T W 
n n+-l 

h+T 


}6° Wn+1 
n n+1 


z(n) - H ® x(n-l) 


Wn+1 

n+1 


n+1 


(3.14) 
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and 


~ -1 T -1 T " 

AK =4 (H H) H P 
n+1 n+1 


In equation (3.13) the expression in braces is the gradient of the 

performance index and determines the direction of the - correction term. 

W is a weighting matrix and determines the magnitude of the correction 
n+1 

term. The choice of W^ + ^ is vitally important since it determines the 

rate of convergence of the algorithm. From a computational viewpoint it 

is more efficient to let W - 'be a constant matrix for all n. For this 

n+1 

choice the correction term merely follows the local gradient at each stage 
of iteration. From a statistical viewpoint it is more efficient to let 


-1 . -o o T 

{nW_ + 6 (6 ) } 

n n v n 


Jli (W - W 
n . n n 


o T 

n + (6 ) 
n 


W 6 
n n 


For this choice the correction term is such that the performance index is 
minimized at each iteration stage. Thus, intuitively (3.15) should converge 
more rapidly, but at the expense of computation -time. 


Instead of estimating P and then calculating AK as is required in 

opt 

using the stochastic algorithm of (3.14), it would be desirable to estimate 


K directly. .This can be accomplished by re-defining the performance 
opt 

index &(•)• Let 


-1 T -It 

A n-1 = 9 ( H H) H z(n+l) 


<5 = z(n) - $ x(n) and 

n 
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&(A , - K 6 f = (A „ - K 6 'J T f A 

n+1 n+1 n+1 n+1 n ^ n+1 


K s ) 
n+1 n 


Then 


K = K t 7,, L(K ) B , 
n+1 n K ^ n n+1 


(3.16) 


= K+ {A-K6}6 W , 
n n+1 n n n n+1 

n+1 


IT H exists (3.16) becomes 


K — K + { $ 
n+1 n 


-1 


— 

A 

’ — 

z (n+1) - $$ x(n-l) 

~ ^n 

z(n) - $ x(n) 

— _ 


.. 


-1 


n 


- 

! 

-1 ! 

= K + 

$ ! 

n 

t 

= K + 

§ _1 6 


n 


z(n) - § x(n-l) 


T 

}6 W . 
n ml 
n+1 


T 

n n+1 


z - 0 x(n) 
n+1 


T 

6 W „ 
n n+1 

n+1 


n 


(3.17) 


n+1 


where 


W 


-1 1 


n+1 


n+1 


-1 T 

nW +65 
1 n n n 


(3.18) 


Equation (3.17) with W defined as in (3.18) satisfies the convergence 

* * i A 

conditions of stochastic approximation. Therefore, converges to 

K , the optimum Kalman filter matrix, in mean square and with 
opt 

probability one. 
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3.3 Estimation of T, E_, R, and Q 

By including an estimate of C(0) in the computational routine, 
convergent algorithms for the error covariance I*;' the one-step 
prediction error covariance the plant perturbation noise covariance 
Q, and the observation noise covariance R, are easily derived. These 
algorithms are 


C (0) = C CO) +o 5 
n n- 1 n n 



- C CO) 

n-1 


(3.19) 


• where is chosen to satisfy the requirements of stochastic approximation. 


Z = K C (0) 
n n 


(3.20) 


r = (i - k ) z 
n n J n 

(3.21) 

A A— ^ A 

r = k r 

n • n n 

(3.22) 

A- A A ^IJ» 

0=2 - $ r f- 
n 

(3.23) 


Equations (3.20) through (3.23) are valid for H = I. Analogous 
results hold for H / I. 


In Chapter IV the experimental success of the algorithms derived in 
this Chapter are presented. 
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CHAPTER IV 


NUMERICAL SIMULATION OF THE SELF-ADAPTIVE 
FILTERING ALGORITHMS 

4 . 1 Introduction 

In Chapter III, the learning criterion for self-adaptive filtering 
was formulated and several convergent stochastic algorithms for performing 
this adaptation were derived. In this Chapter these algorithms will be 
applied to specific systems. The experimental -results were obtained on 
The University of Arizona's CDC 6400, using the FORTRAN IV language. In 
interpreting these results it should be pointed out' that double precision 
arithmetic was not used. 

4.2 Experimental Results 

Given the system defined by 


x , = $ x + W 
n+1 n n 


(1.3) 


z = H x + v 
n n n 


(1.4) 


with Q and R unknown, the optimum estimate of x is computed with a 

n+1 

Kalman filter of the form 


x 


n+1 


x + K 
n 


z_ .- - H§ x 
n+1 n 


(2.14) 


Since Q and R are not known the optimum value of K can not be calculated. 

> 

However, 10^ can be- learned using the algorithm 


X A 1 rn 

K = K- + 6 S W , 

n+1 n n+1 n n+1 

n+1 


(3.17) 
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with 



(3.18) 



Figure 4.1 illustrates the adaptive process ‘for an initial value of the 

Kalman filter 

K = 0.0 
o 

which corresponds to the one extreme of assuming the measurements are 
just noise, i.e., they contain no information. As can be seen as 
determined by (3.17) has essentially converged to K Q p t within 200- 
iterations, with -> K ^ as n -> «>. r, Z, Q, and R were also estimated 
using equations. (3.19) through (3.24). These results for 1,000 iterations 
are compared with their actual values in FIG. 4.1 

FIGURE 4.2 illustrates the adaptive process .for an initial value of K 
equal to 

K o = 1.0 

which represent the .other extreme of assuming the measurements contain no noise. 
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FILTER “ K( rj) . 





FILTER — K ( n i 








they are perfect. Again 1C has essentially converged to K within 

11 opt 

600~iterations. This indicates that algorithm (3.17) is not sensitive to 

the initial value of K . 

o 

Other 1st order problems were considered with $ being varied from 
0.2 to 1.0 (the threshold of instability), and with the values Q and R 
also being varied. In all cases considered converged to K Q p t within 
2,000- -iterations. 


EXAMPLE 2 : 2nd Order Plant 


For this case 


and 


0 = 





— ► 

— 

0.966 

0.000 


1.441 

0.738 

0.155 

0.894 

,0 

II 

0.738 

0.610 


K 


opt 


= I 



, R = 

I 



— 

2.00 

1.00 





1.00 

1.00 





0.600 

0.200 


rk n 



0.200 

0.400 


1 

t lc 

L_ 21 

k 22_ 


For this example K q was chosen to be 


K = 
o 


0.40 

0.00 


0.00 

0.60 


The results of the adaptation process is shown in FIG. 4.3 (a) through 
FIG. 4.3 (c) . Again the process has essentially converged in 1,000 


iterations. Note that symmetry was forced on k (n) and k- (n) . 

. 12 *^1 

The adaption process for the same system and K q without forcing 
symmetry is illustrated in FIG. 4.4 (a) through 4.4 (d) . 
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CO 
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ITERATIONS- n 


00 





0.30 


0.25 


0.20 


0.15 

c 

1 OJ 

.32 

ro 

> 0.10 


0.05 


0.00 


0.05 








00 0 





The random sequences {w } and {v n } used for simulating the adaptive 
process were Gaussian in all examples presented. Analogous results' were 
obtained for uniformly and triangularly distributed sequences. 
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•CHAPTER V 


CONCLUSION 

This report has presented a self-adaptive technique for learning 
the optimum Kalman filter matrix in an environment where the covariance 
matrices of the plant and observation noise are unknown a priori. A future 
paper will describe this technique in greater depth and extend its appli- 
cation to nonlinear systems and present the experimental results for higher 
order systems. 
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TIIE CONTROL OF NONLINEAR STOCHASTIC 


CONTROL SYSTEMS UNDER DISCOUNTED PERFORMANCE CRITERIA 
Introduction 

Systems described by difference equations (state equations) and 
subject to uncertainty as to how they will evolve are of interest in 
many fields including engineering and economics. The optimal control 
of such systems was first formulated by Bellman in 1958, and major 
contributions were subsequently made by Howard (1960) , Derman (1964) , 
Blackwell (1962, 1965), and Veinott (1969). However, while much 
attention has been given to the existence of a solution under various 
conditions, little work has been directed toward the development of a 
practical algorithm. It is the purpose of this report and the author's 
disserration to develop such an algorithm for discounted performance 
criteria. A fundamental study of stochastic control systems is made 
in Chapter II, which establishes the basis for the development of the 
algorithm in Chapter III. This report consists of these chapters and a 
summary of the example problems worked to date with a brief explanation 
of a proposed nuclear rocket control study. 


iii 



CHAPTER 2 


THE CONTROL OF FINITE MARKOV CHAINS 

1.1 Introduction 

A meaningful analytical examination of the stochastic control problem 
is found in considering the control of finite Markov chains. Dynamic plant 
equations and plant noise are modeled by a set of transition probabilities 
over a finite state space. Each control law is associated with a set of 
transition probabilities, and a cost function is defined. It is found that 
the cost function may be minimized by either dynamic programming or Howard's 
policy iteration. This chapter examines both these methods and the propertie: 
of the cost function under various control laws. 

2.2 Finite Markov Chains 

Let (0,3^ Prob) be a probability triple with Q the set of elementary 
events, w, } the o-algebra of subsets of Q and Prob the probability 
measure on & . The finite set of real numbers, X = { 1 x, 2 x, ♦** x } is 
called the state space and constitutes the range of the random variable x 
mapping a onto3£. A stochastic process is a sequence 

of random variables. 

The stochastic process X is said to be a Markov chain if for 

L n € J E n = { ® >7 S 6 > 

I * 

fc . 1 <■* A ^ a-v\ ~ L \ , 

whenever Prob [E Q f\E r\E^ ’ • ] i= 0. That is ^ 


1 



2 




a 


fhi Cn'i, 


where the pij (n) are the transition probabilities defining the chain. The 
transition (stochastic) matrix for the chain is 

= ^'pcjCvV)! n =o,\ , . - . 

The transition probabilities are related by the Chapman-Kolmogorov equation 


PC j Orrs v vCy= % "pvfeC m or) •, rc\ 4 r 

\t*i ' ' j 


(2.1) 


where 


pt^ C'OTVA*) == Vco\o Aft - J x | 


__ 

^ W; *“ 

/"*v 


- X 




^ v-a^n * 


Then 


Let 


A chain is said to be homogeneous if 
— p ~ CO'as\-c^\ 

t> (o^ = r^tj (o^-s v" 

/ r ■ ( r f) - Prob^^n — J X ] be the a priori probability 


that the chain is at state at time , and let 

jU (rt) = C / u > (r> ) •> ' ' ’ ' > f ^ 

be the row vector of all a priori probabilities at time tl , then 

jU(rt) = /Xfo} P( o^i) 

and, for homogeneous chains 

/X(n')- t o 



3 


The states are classified as 

(a) ^'X. is persistent if Prob ’X*\ ss * 

(b) is transient if Prob “ X 5>®xr,e. ^ \ ^ 


(c) ** 


is aperiodic if 


•j.c.J. - i- > 

and 


(d) '7C is ergodic if it is persistent and aperiodic (for finite 
chains) » 

A chain is said to be ergodic if all states are ergodic. Examples of 
state classification are given in Figure 2.1, where the transition 


probabilities are represented by arrows. 

The following theorem will be useful in examining the control of 

Markov chains . 


Theorem 1: For a finite homogeneous ergodic Markov chain with transition 

matrix , there exists a unique stationary probability distribution jX , 
and 

'^p > C jQh)— O.'i n — oo geometrically fast. 

Or in matrix form, 

n 3 

^ geometrically fast 

\\ 

(Doob (1953), Ch. 5 §2). 

Thus 5 n / 

jALrt) = jUio) T — ^ yU(o) IfJL b 
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or 

M ( r\k —v a 5 n co ^ 

and jji — JJ. "~P • 

2.3 Controlled finite Markov chains 

The dynamic system to be controlled has a finite state space 

• x = *•*,**, 

and is observed periodically (at every discrete time period) . At each 
time period a control, , which influences the behavior of the system is 
applied from a set of possible controls A, As a result of the application 
of the control € A with the system in state at time fc; 

there is a time independent, 

(1) stage cost Q K. A 00 incurred, and 

(2) transition of the system from at time to X fcvv e X • 

at time \ ~ te-H with 

= 'P'-o'o L?te»r • 

There is also a discount factor, jg, , O ^ j*>< \ ; whereby, the cost 

£(*,<*) for being in state and applying a control e{ T\ periods into 

the future has a discounted cost of at the present. 

Let U denote the set of control functions i K from ^ into A 
(i.e., implies 6 k for all ). A policy a , specifies 

a sequence of control functions for all time; ^ ^ Uo ^ ^ 

Thus, at time V , with the system in state Xj, , the control A 



() 


is applied. A stationary policy is a policy for which U.^= U , 
i.e. , ^ ^ UiU j • « • ^ tx IA*° 

Let L(u') - ( l ( u( t ^c)) > * . • j 1 C u^))} 

~ L ■$ i ( { ^-) j j » ° 6 > ^ ^ 7 J 

be the column vector representation of the stage cost for all states 
under the control UeXJ . Let PW be the 3VJ Harkov transition 
matrix for. the control tl in the Harkov chain established by the policy IS t, 

POh = t 

Thus s by the Chapman-Kolmogorov equation, the transition matrix from time 
to T = K is 

PC'TV'vO ~ \Ko^ '• • * VX,o^ *. 

for the policy \\ and the initial state the total expected cost vector 

is ' \fcf) = ( U"( f,ff) 5 cru> Tf) > u-(T 4 ^y 


where 


or 


VT ( r v i~V t 21- Pu, U„ ( )} | ^ (2 • 3) 


where "If - 


rx ( u ") ~i~ ^ j \J^ ( % U VN (Xw')') \c jTTj - 

(. c oa ji j 

= J-li “ «) 4- j>t^h j 2| j" J(-x * , u„(^\?*] 
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Thus „ 


v \"7 ; _N ; 

^ ^ ^ X „ *4 - 


orj, 1*1 vector rorrA, 


\ (lO - L(a 0 \ K> \ : '(a,>~\7( 7C ') 

. . I 

saf 

- V" ; *"* p' „ • n ,r vV\- .. - 


\ ! v. ' ' i -v. 1 \ C* ; 


- O-T £w.l \ , 


-> V. * ‘ '. \ \ 

^ k. *- cd } sj v, * u • f z or 


2 . L ZZz 


<- O-i- \s ll*iA - a_V— 


The cheorens In this seecicn nr a due no 31 


sonny. The proofs which follow Slack., o 




me rnsronz tnay provide Into ; u hekavim m she expense 


:rlor.. V C'h N , 


u. — cs c. 1 — jh 3 -< 2 Hw uyt/Oo o~ 2 ^, 


-ma. There exists a 7 - •.-' tuns nor an aroiernr > 


ary X- (\ % \T. 


and nny 'll <0 ”1. }~ 


• i . \ * l,-. 

\ 


‘ * i v % , 

! 


Proof . Consider the i ‘ element 



8 


^ et \ t (°0 ~ 2-1 C°0 ^"j for any o\ £ t\ , 

Obviously, 1^ L (oTp^ • Thus, the set ^ \ol<& 

has a lower bound of zero and hence a greatest lower bound for say ^t^A- 

The control function ■£ such that j-(S« )v o(^ satisfies the lemma. 

Theorem 2 (Blackwell) If there is an optimal policy TT *- ^o,U Vi **« ^ , 

there is an optimal policy which is stationary. 

Proof. By hypothesis, 

Y('V') 4 Nf for a11 

N i(V) =r LCuc ^ -V jfPWV(^ 

where T* = \ , . . - 

Mso, 4\j(\\Y 

thus, V y / L -t fo r \ ) Co. 0 s )V(^') . 

By the lemma, there exists a ^■<£'vJ such that 

> US) * v ^*) > 

and ^ oa. cys.m ^ _ 

V(vO >, MO + p w[ i 

> L(f| + ^ ?0T) l L U.') + f. p M U ,- ')l 

U.O + p(-0 L L(f> 

MO + . 


* 
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By continuing this process 

•a- i 


■+ ^ P N (-f) VCi^) . 

Ks p “( £ ) V *) -*■ O sir ‘ c - and ?"Xt ) 


is a stochastic matrix. 
Thus, as hi —*?■ Oo 


V(y*) > VCf) 

but since is optimal 

V(tc v> ) = V(^) 

p C” 3 

and is a optimal stationary policy. 


Theorem 3 (Blackwell) Let ^ ^ ^ Vj»o v « • a 

and IT ^ • If 

V(ir) $ Y<t\'> for a11 p 

then "VC is optimal. 

Proof. By hypothesis „ 

LCO -V|?,?(4)V^ >/ V&f) for all $• t ^ 

° r x \V\ pcx\-V\CvA«.V . _ 

L(M* p POf*) V CiO > for a11 ^ ^ 

a & h|( 1 K S )'7/ N/C^ f or € XT 

O ft 

LC-M l p'PC^-h V(tf)> Vsn k-rc-U T.^'vr 


Thus 


or 


Con 


or 


:inuing this substitution process for the policy TT^- ? r / „ 

- L •-y h , )UoiUi] . 

L(|,H ^Ptf,^LCfO+ •• ■ •» 

-V 6 M P(f,V- • K^N/CT'i i/'jTj 
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Again as [N PC -f\) P( ) * ' * )V(TT) ->* 

Each is an arbitrary element of 1/ ; thus as ) becomes 

the cost of any policy. That is . 

VC^') 4 V(t0s for any V\ . 

Thus is optimal. 


Theorem 4 (Blackwell) Let X ~^U 0 cl ^ and 

If V(T l') < M(lf} > then 

for the stationary policy f , N/(f ) <1 V(tt') . (. <_ means 

for all elements with for some element) 

Proof. By hypothesis, 

L«H fP(flVCT') <v Cm) , 

LCf) -t j? TO)lL«Hp(0V(wi] < \j{^ } 

thuS ; 2 

'-t i ;hp(f)L(f).* /p(4)N(if) < V(Tr) . 

Continuing this substitution process. 



.< VU) ■ 
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Theorem 5 (Howard) If k is finite, then there is an optimal stationary 
policy. 

oo 

Proof. Consider any stationary policy ^ , then either 


(a) 


i C * , «C) + & pij (oil) ^ ‘ 1 ^ for a1 - 1 *1 


et\ 


or 


^ ' nt ■ ch 

for some 

and some u 

If (a) holds, then for any “F £ k/~ ^ T^* ^ ) the policy 

— '= (4, ° ^ is more costly than the stationary policy ^ » i.e., 

V Cf) .0/1*') 

CAS 6/0 . 

and by Theorem 3 ^ is optimal. On the other hand, if ^ xs not 

% 

optimal, i.e., there is some 1 for which (b) holds, then a new control 

* « 

function, IX , is defined such that for all l , 

i 

*) 

• , for case (a) 


u( V) = 




, for case (b) 


Then by the construction of U , for the policy . \\^ = \ ' J ' \ ^ ° 0 0 3^ 

NK-^hj < VOf) 

By Theorem 4, 

VU") 

■Oo > oo x • 

Thus, we have a policy, H , which improves upon ^ • Since r\ is 

finite, there are only a finite number of stationary policies. Thus, 

there is one which has no improvement and is optimal. 
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The motivation for restricting the' class of control laws studied to 

those that are stationary is contained in Theorem 2; it is seen that any 

optimal policy may be replaced by a stationary optimal policy. Theorem 5 . 

lays the basis for a constructive method of finding this optimal stationary 
♦ 

policy, Howard’s iteration in policy space. In the next section this 
procedure is explained. The set of admissible control policies is taken 
to be stationary; thus, for notational convenience the policy U \ 01 1 cl, - 
and the control function U fikJ are considered to be equivalent, and 

nM -v(cO . 

2.5 Howard’s policy iteration for Q < ju 


Before the method of policy improvement contained in the proof of 
Theorem 5 can be applied, there must be a means of obtaining the expected 
cost vector, \jtu) , for any U C- O" . Consider any stationary policy, , 
over \T\ stages, then let 1 

\X, U ,uV = t l ^ f j ( "S-fc ■■> u <• - ■£ 

» /1(a) 

. w / . .A (2.6) 

~ JU U') -v JS 2 .\ » 


or in matrix form 


- l_C^ “V ^ I s 

The stage cost function , uC*-)) is bounded for all z by 

definition. Let this bound be V\ . Then 

t 

Vv,Cu^ ^ VA ^ vAV. A ° * ° -t AAi 

. V- Cm,;. •,'Y t ' 
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^ C •» - “ ”V B 


<; .VA A ■for a\\ O < 8 < ^ 

' • . 

It is apparent that the sequence 

ir 0 ( L.u^OuC l;oi , • ••*, Or*UvU\ sV “ 

» 

is monotonically increasing - for all 1/ . • Since Ov l CC v 0k.') is bounded 3 
it follows that the limit exits. 

Say, Um U- n (C^'= o-ci;u3 • 

r\-^oo 


Co 


This limit is 

or the total excepted cost of applying the policy uf^" from (2.3).- 
Again taking the limit as oo ' from 02*6), 


or 


Thus 


and 


\r 4p pij'wa-Cj,^ , 

V(u , )=- L(u') + r’PC^'JCu) 

[l - = LCu) 


( 2 . 8 ) 


) 


■ \l(d - [x-^Ku)3 LW 3 


(2.10) 


if the -inverse exists. 


To establish the existence of the inverse', consider an arbitrary 
stochastic matrix ? . j&T exists if and only if 

A^.’V 'Ll— V ° » or d<S'V V 3 "^ 0 where 'X r ^ ^0 . However, 

for a stochastic trqatrix Ip,', all eigenvalues are of magnitude equal to 
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or less than, one. Thus, de/V ^.AX-^\ sr ° only if \ A\ { , hut 
implies \X\> X . Therefore, dfi.4- [_ X JT~ $ , or 

equivalently cd-^d ^ -L“ \ O , and the inverse exists. 

Another useful result follows immediately. For a fixed policy 
the cost is a continuous function of 'jB . Consider 


VC a) = tX - jL ?C^\ ' L.CU') . 

It is apparent that the elements of the inverse are rational functions of ^6 

with no singularities for • Thus is a continuous function of 

’ '* . ■ 
Howard's policy iteration is a two-step iterative process as 

follows: 


• (1) for a given stationary policy ti € VJ determine 

. = [X- iSPt'AY'uCu') , 

and go to step 2 with V- VC<i') ; 

T 

(2) for the cost function \| -( •* select lA^O^such that 

uC ( x)£/\ 

minimizes 

( W , <jl( **)) *+ yS 2-\ pcj (u C^i) i ^ ~ 4 j * * * > 

and repeat step 1. 

The process is terminated when step 2 yields no further improvement. 
The resulting U is the optimal stationary policy by Theorem 4 for a 
finite control set A. The last V generated by the process is the total 
expected cost vector for the optimal policy tA , The policy iteration . 

procedure can be started at either step 1 or step 2. If there is no 

> 

convenient policy to assume for initiating' the process, that is, if 
there is no policy suspected to be near the optimum, then it is attractive 



15 


to let V- C> initially. This results in the first policy iteration 
improving upon the stage cost — a reasonable procedure if no additional 
•knowledge is available about the optimum. 


2.6 Direct dynamic programming 

An alternative to considering the infinite duration process with a 
stationary control just solved by policy iteration is to examine a 
finite duration process. An optimal control sequence which minimizes the 
expected cost over A time periods is sought. The conventional dynamic 
programming functional equation results, and taking the limit as "A > C3 ° 
the same control is obtained as by policy iteration,. Consider 


= rr\ 


{ ip; 4 £ \ % &\£( *k,Uh(a^l - ‘*x \ 
lto “| ~ 1 • J 


= truv 1 Wc *li(.uoVaZ( ^ 

^ „ J * 

. \ + & Z, 

U<\-i L j ^ * 


VvxC‘0 = 


XY\W- 

u 


AA S^A'k J> 


( 2 . 11 ) 


where . L-fi is an arbitrary terminal cost, 

As before, the set .of cost functions 

{ or t ( iy j • * “ j 0^ (v 


n- 


is bounded for all 
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L - . » . . J since 

,0< Ovt'Oi s< VA C H V 

where W\ = yyng^c i. (^.'i^ ) 

Let vjy^ for ( u \,." ,T ; for this 

terminal cost U^L'i decreases monotonically . • To show this, observe 
that 

^oCD^or^O f or a11 ^ 

Now assume 


•U- VN U') ^ vT^CA 

and show 

Or^uV^ Or^CO 


for all 


•for all 1 



Thus Or ^ 5 and OVxU) is seen to decrease 

monotonically. .Again, since the sequence ■^U’ 0 C^ ^ C l.') ^ ^ 

is monotonically decreasing and bounded below by zero, it has a limit 
as Y\ -=? VO , say, 0" ( • Taking this limit in (2.11) 
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LX ^ - YY'U'A 

u 

Thus, (2.12) defines the expected cost function for an optimal policy over 
an infinite duration. Furthermore, it can be established that the solution 
to the equation is unique. Assume to the contrary that two solutions, 
exist with associated control functions q and S , Then 





( 2 . 12 ) 


O'C't') = Jli M -v ^ ptj (u) \y ( j 

6 fej ( j') . 

Subtracting yields, • ^ 



By successive substitution. 

Taking the limit as v\ tfo , 

Q (h)- G“W $ o for all t . 
By a symmetrical argument, 


^Cl'l - O-C-L'i for all- L, 

Thus , 

and the solution \/ = ( UClN, , UCiV) to (2.12) is seen to be 

unique. Also^y letting the control which results from dynamic 

programming is optimal for the original cost function (2.3). Since 
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\TU) 


vvjm | Jii M * |S ^ 

JUU*^ -vfi-Zv or eft 

J > ' , say 


There exists no 



such that 


3 



It is seen that the solution to (2.12) is the same as the solution of 
Howard’s policy iteration procedure. - Thus, the solution of the dynamic 
progr amm ing iterative equation- 


«• 


V'n - -V 

as Y\~=? t>° yields the same cost function as does policy iteration. It 
is also apparent that, if the limiting control function resulting from 
dynamic programming is used as a stationary policy, then this policy is 
the same as the one resulting from policy iteration. 

One important question still remains unanswered. What is the rate 
of convergence of the dynamic programming solution to the stationary 
optimum? As before, the sequence 

U'kU'w •••■ 

decreases monotonically to • 
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= ii Cu*) ■* j?, 2 ^ pcj( q*)'\J(.^ . 

^-- rw ^ ■liCu.'i -\ j£ 2J piyCu-so-^A-j^ 

^’viCy'i - OrCV> 4 |2, p^U^^o-^.Cj'i-O-C^ 

J 

€ n = v\<x* ^G^CWU^ 


then , 


^ p €wi * C2.13: 

The maxi m u m deviation of \ /^ from V thus decreases at a rate of 
.at least . Practical experience shows that this estimate of the rate 
of improvement is quite close. It is seen that for A ' less than about .7 


the rate of convergence is very rapid. 

The maximum error, <£,vt , is, of course, impossible to obtain during 
the dynamic programming algorithm since the final cost V is unknown. A 
bound on 4.^ can however be found. 


s n = mAx y oqco -a-v,.w \ > 


As before, for 


u- n a'i = X;tu*) + & ^ 

j x 

or nM UV ^ JUW) -V ^2, Y-ci vWj} 
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Thus,. 




and 


$ j% <Sv\~V 


However , 


6r,= 5 'A-W V ^yWL * ' ' 

^ Sv\ I. j$ fjZ * ‘ “ 1 


or 


^*'v% ^ i " 

T> 

Thus the error is bounded by the observable stage difference, On. 

The dynamic programming algorithm can be terminated when Svjgets 
sufficiently small. 

2.7 Howard* s policy iteration for 

The control of finite Markov chains with A” I (*■•£•» 110 
discounting) is somewhat more difficult to examine than the discounted^ 
cost chains. It is convenient to assume not only a finite set of 
stationary control laws, but also to restrict /\ such that for any W 
•the resulting Markov chain is ergodic. Before defining what' optimal 
. control means for the undiscounted costs, the behavior of the cost 
function is examined. 

Let, 


^ \l~ o ' 


(2.15) 
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be the undiscounted expected cost function for the stationary policy, 11 , 
applied to stages. Then, as before, 


( l < u) = c u> ) V 

with OljC 0 = 0 , 


• (2.16) 


or ' 


^ ( i,u) - A i (u) + S 4 pCj M A] tu) v " M* 4^’P £ i ^j Ca) 


In matrix form 

\j v iC’A V ) “ i—^') VV ("*“*•*) 


= l^v t^ujav- * • • + 


(2.17) 


By Theorem 1, 

A 

Pc^ ~ ljJl 4- 

where -*? O as 'H — ^ c *3 , geometrically fast. 


Consider, 


Vi yy% 


exists, 


n 

However , 


t\tf x\ ’ if the limit: 

n - v ^ W* 4 * 9 m=-d ^ 

— ■ _JUvv> Y1 ‘ ' Z luL-M 4* JjwA T) ' ^Qv^(^V*4}z) 

m?0 ' <a_»a» h>~ O 


^ ^ Qvw('-0 LC^Ol - o 


Y\ -=> 00 


since iil«^wv (u.^ •= O » 


OO 


Thus , 

/irn rf ' Vv)( a) = j/lLCu^J 1 


and for large A , 

V*<v\^ nQii-m +. constant 

VaCu^ ^ noCu.')!. -v- Vsl^Cp 


, say 


(2.18) 
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The scalar = M is the stationary average, cost of the 

policy U. , and the vector V\). A (u^ ' is called the potential of the policy. 
Substituting (2.18) into (2.17) ; 


*3^ t -^(vA ~ LMVfwi [<_*.,■) 3 m1 + w m wi] 

or 

( a) V 3C«) i ^ LU1 ^VCu-)^^ w _ 

■with N N)- v (u\= 0 
In the limit as ^ j 

Vsjlu^ -Vg(u)i - L(a^4-P(uY-\Aj((p (2.21) 

A stationary policy U V e.O"is said to.be optimal if 

' 3<.cvO ^ aA\ ue'XJ . 

That is, the optimal policy for \ is the one which accrues the least 
average cost. 

The question arises, does (2.21) determine ^( uS > aiK ^ W&V uniquely? 
To answer this, consider two solutions, Wj ^ and Y^d for the same policy 
^ . (2.21) immediately yields, 


sM-y H - Tw^-vl 

or 2. P(u^ -v e ■ 

” here C= (a- 3 ) 1 > ■a=MJ-V 

Thus , 

H. = nC VPW^ 

—s- ne + A-pZ , as 

However, the elements of ^ are bounded as i^ us C — O and <^-<2 . 

Therefore^ the stationary average cost is determined uniquely by ,(2.21). 



2'i 


Now, with. O > in the limit as -A— ^ ^ 0 J 

T 

-7 A' "K m ; - 1 oo i IA ' y & 

■7- SS 1 yU T&- ) Pj 1 1 i 

~ 1 r l 


The only solution to this equation is 


Hl ~ 


constant . 


Therefore, the potential, YJCu,) , for a given policy, uclj , is 
determined up to an additive constant. 

Howard’s policy iteration for undiscounted cost may now be specified 
as follows: . 

(1) for a given stationary policy, U £ L7" , determine and 

from W(u) from 

W (o' i -V Q (ul. uc L •h’pCtt) 

V 

and go to step (2) with W = WCu) 

(2) for the potential function, W , select U such that l jC 


minimizes 


Jl ( l ^C j U(‘X)) + 7U- pj(^) V\Ij 

and repeat step (1) . 

Again, the process is terminated when there is no further improvement in 
- Vs] , or equivalently when the policy '0- ceases to change in step 2. 

To show that the policy iteration indeed yields an optimum stationary 
policy, consider any policy 1^7 , then 

VvTCuvOA “ £i^\ * (2.21) 

sj 

A new policy, is generated by minimizing the right hand side of 

2.21. It is apparent that the additive constant in W does not affect 

A 

‘A . Now, 
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where 


gcu}-* JL LC^ * X 

* 

and "7 applies for some b . 

Thus , 

VsTC^u.^ -'vhr(l ( a) ^ ^ (2.22) 

A 

Recalling that for the stationary probability distribution, yd , associated 


A 

with U 


and multiplying, (2.22) by JX± and summing yields. 


v 3^- c(a) > 2» A*$ fvsrCji^-vrC^u^ 


or g (.U) - 3 ( U ) > o 

Therefore, <^( (3") and t ^ ie P°li c y> U , generated by policy 

iteration" is superior to U , the policy which preceded it. Since there 
are only a finite number of policies eventually there occurs a policy 
which can not be improved upon in step (2). This policy is the optimal . 
policy . 


2.8 The optimal control as 


It is interesting to consider whether the control obtained for jS>< \ 
but sufficiently close to one is the same as the control for . 

Let 4\ be a fixed policy arbitrarily close to one with an associated 
optimal policy U ; ■ Then call the second best policy the one with 
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minimum maximum deviation from 


'Vi 


CA 


the dependence on 


f 


)^) 


where the notation includes 


Thus the second best policy is such that 


mo.* \ U- 

i 


is minimized. Now, since for any fixed U 


is continuous in 


it follows that there exists a ^ such that for all 


■/ * 


in 
\ the 


cost of the policy lA is less than that of the second best policy. 

Holding this policy fixed, a potential type function is defined for 
j!> a<^\ so that the optimal policy as l may be examined. Let, 


' ^ N/*Cu* f) - Z* |T i-. 

where is the average stationary cost of 11 for 'j%—\ , and 

V n )§ 8*?V)Uu«) 

v\ w ' . \s.=T> I 


( 2 . 24 ) 


Since 


21^ ^ ^(u^l is^ constant with respect to l ^ finding the 


control (j. such that U(‘*) minimizes' ^ ItCu') ^ ' % puj M 
is the same as the control which minimizes ? . . . , .. ,t- / ' -s 

Now examine the potential function as \ , 3 

JJ- • k r ■ 

^(u*k)= 2i jl 

1 | } J \-p 



3^ -V ^ - «CV) h 


. v O ' 





26 


In the last section 


A 

/r\ 


^ fc Ca < )LCu / ) = W(u*> 


existed for undiscounted cost; thus 


jljsrri J?tsrr> VVVi ( ft) — VJ(U*) 

rj-'pao A-*? I r 


and further it is apparent that. 


j lyY1 W* (o* i) = vJC 

n— ' 

JUro Vvn(U*./0 ■= N/Cu* 

Vi 1 1 


V\ i 1 

Since the control policy obtained by applying step two of Howard's policy 
iteration to either \Z'a{^ ji') or is the same, it follows 


that the optimum policy is valid for 




$ \ . Thus the undiscounted 


problem can be solved by solving the discounted problem for ■& . sufficiently 


close to one. 



- CHAPTER III 


A NUMERICAL ALGORITHM FOR OPTIMAL CONTROL 

3.1 Introduction 

In Chapter II the characteristics of the expected cost function were 
examined, and two methods, Howard's policy iteration and dynamic program- 
ming, were developed for obtaining the optimal control of finite Markov 
chains. . In this chapter stochastic systems whose state space is defined 
on the continuum are considered. However, rather than view these systems 
rigorously as infinite state diffusion processes, they will be considered 
as finite Markov chains with the large but finite discrete state space. 

A numerical algorithm which employs a quadratic approximation to the 
expected cost function for a partitioned state space will be developed. 

3.2 System description 

The systems to be studied are defined by a set of difference ' 
equations 

. (31) 

called the plant equation, where 
time parameter 
n-dimensional state vector 
q-dimensional control vector 
n-dimensional random vector, plant noise 
n-dimensional vector function. 

The state is restricted to the state, space 

X and any transition out of this 


k = 
*x = 

(X = 

5 - 

n _ 

T - 
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region under (2.1) is not considered. The control «. (o{ () -- . , a^) is 
restricted to the control space . The random variable ^ , called 
.the plant noise, has a known probability density function, , 

which is time invariant ^and ^ is independent from one time instant- to 
another. If it is desired to model a system with correlation between 
plant noise from one time instant to the next, it is possible to define 
additional state variables and new random variables for which the plant 
noise is independent (Meier 1965). Also, with no loss of generality 
is considered to have zero mean. 

Stochastic constrol systems with continuous state space can be 
considered, as an approximation, to be finite Markov chains by establishing 
a grid on the state space . The grid points are states of the finite 
Markov chain and the transition probabilities, j , defining the chain . 
under a stationary control law, are obtained by determining the probability 
of a state transition from on the grid to a hypercube about J on 
the grid. To better illustrate this, consider the second order system^ 
in Figure 2-1. The transition probability under control o( is 

defined as 





(3.2) 


J X Z - 

The stage cost at time ^ is defined ^as before ^ 

O < £( xCtt'i ,’oUtV) < ^ 

The total expected cost function is, as in Chapter II, for a stationary 


control law, u, , 




’ ,‘~ure 3.1 Grid for discrete; state approximation to a continuous 
state system. 
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A ( tccu') - A 1 

Again the control law t_x with U OcCfc))) C r\ is sought which minimizes 
'V~(x s ) for all ">cO£ ^or f or the finite Markov chain representation ,311 
which are grid points. 

As before, 

U - (->0 - X C^o, -V ft £ (3 - 3) 


"U" 




i 


3.3 Solution by Howard's policy iteration 

To find the optimal control via policy iteration it is first necessary 
to model the system as a finite Markov chain. A grid must be established 
which is sufficiently fine to approximate the behavior of the system 
defined on the continuum. Dividing each coordinate into K {{ equal 
increments wide accomplishes this for small enough , 


N t = 


"X fn 0.x‘ w xrvii r) i 
J\ ?C jl 




n 


‘h 


and defines J" ~ grid points, 

t -l 

Now to obtain transition probabilities under the stationary 

control U. it will be necessary to perform the integration in (3.2) 
times. Then having attained the OXJ transition matrix 

step one of the policy iteration procedure (Section 2.5) requires 
inverting 


L 


I - 6^1 


J 


A 


also, a ^ j matrix. In the minimization in step two, it will again be 
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necessary to evaluate (3.2) J times for each control law considered. 

The number of control laws considered will depend on the numerical 
minimization technique used, but it is evident that this number could be 
large even for limited control spaces. To see the prodigious labor 
necessary to employ Howard's policy iteration for systems with continuous 
state space, consider a second order example with 

j l £ 'X, $ 300 > i 

— 4- 

and let A"x_i~ t • Then Kl, - - ioO , and J - lO . Thus 

P has \0 S elements as does, • Already it is evident that while 

Howard's policy iteration is a valuable technique for finding the optimal 

control of finite Markov chains with very few states and a useful 

theoretical tool, it is impractical to employ it on the systems defined 

in this chapter. It would be necessary in the present example to store 

100 million transition probabilities in computer storage and invert a 
4- 4- 

{0, K \0 matrix to achieve only step one of the first iteration of 
Howard's method — clearly an overwhelming computational task. On the other 
hand, it will be shown in the next section that dynamic programming as 
developed in Section 2.6 offers a more palatable numerical solution. 


3.4 Solution by dynamic programming 

To employ dynamic programming, as before, a N-stage minimum expected 
cost function is defined. 


or 


hi) •- E 




A fA C b)> U k ( *< «'A 1 * (e Y--X* \ 

lc-o j 


VCX, kO “ miM { X ( Tto , tX(-Xo'j) a/( i 

Uu I t 


0-3b) 
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with V_T C -=■<> . Again a grid is imposed on the state space with M c 

• A 

increments along the axis and 3 -TT hi ^ total grid points. It would 

i - 1 

now be possible to employ (3.2) to define the TvCj" transition matrix P and 
(3.. 3) would become, as in the last chapter, 

IT( L -x - rrw'w j 1 C ^ ( > ^ 

u L / j 

for all the grid points. However, to avoid the difficulty of obtaining V" , 
a more convenient approximation is to quantify the noise in a manner 
similar to Imposing a grid on the state space. That is, the probability 
density function is approximated by imposing a grid on the domain 

of \jc' and attaching a probability to each grid point. -Then the noise 
is described by the set of noise values ^ ^ ( and the 
associated probabilities, ^ pC ) L-* P •** 


Now 


equation (3.3»}becomes, 


■uC v,w)= < 3 -« 

/ J 

0~( W t o') - o for i. ~ (,7, ■ • jT • Equations (3.1) and (3.4) describe 
the dynamic programming numerical algorithm for the solution of the 


stochastic control problem with discounted cost. While the dynamic 
programming functional equation (3.4) offers a solution to a wide range 
of problems analytically, the computational requirements of high-speed 
computer memory and computing time can become excessive except for simple 
problems. The memory requirements are the same as for deterministic 
problems while the computation time is more severe. To better observe 
these difficulties and to see that Bellman's "curse of dimensionality" 
not only affects memory requirements but also computing time in the 
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stochastic control problem a more detailed examination of the algorithm 
is in order . 

Since it was shown in the previous chapter that ~ ^ C 

' r ; ---••£=<? 

there is no necessity to store all the cost functions and control functions 

generated as (3.4) is solved. Only the last cost function and the present 

cost function, and control function that is being generated, need be stored. 

__ *\ 

Thus, 3 13 3 = 3 v\ H memory locations are required to store the inf or- 
i*i 

mation vital to the iteration of (3.4). Further, for economy in . 

computation time, these values should be stored in high-speed memory 

(Larson, 1968) which for most computers is limited to about \0 words. 

Thus for the second order example of Section 3.3 it would ‘be necessary 

4 

to have available high-speed memory locations. For a three 


dimensional state space with H - \0<h ,(Mv't v 'L 3 11 V ^ 


storage 


locations would be necessary, overwhelming the capacity of nearly any 
computer. This '‘curse of dimensionality" is a severe limitation to the 
problems solvable' by dynamic programming. A first order problem is shown 
in Figure 3.2. To evaluate U ( ^ ( \z) x/ith the control U i tc ) applied 
it is necessary to evaluate by interpolation of the stored 

cost function at time feU times xrfiere is the number of 

discrete noise levels used to approximate the probability density function 




For a second order plant with 


x t (w, = t v ( ^Ciz-),u) 4- f.no 

~ Tr It-} , u) + p 

and ^ independent of C y^ then both ^ and ^ could be quantified 
separately into sa--.; and levels. Thus ^ f v '1 ? and in general 
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Figure. 3.2& The dynamic programming numerical algorithm for first 
order problem. 
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for a n order plant H c - \\ yA t „ , and the cost function must be 

evaluated 3 times for each iteration of (3.4). Consider each 

C- \ 

noise element quantified into, say, five levels. The number of cost 
function evaluations necessary for the stochastic problem as opposed to 
the deterministic problem ( ^ increases by a factor of 

five for each increase in dimensionality. Thus the "curse of dimen- 
sionality" affects the computation time of the stochastic problem with 
respect to the quantization of the noise. It is the main purpose of 
this dissertation to develop an algorithm which alleviates the high- 
speed memory requirement and long computational time intrinsic to a 
straightforward application of dynamic programming to the stochastic 
control problem. The next section begins - the development of this 
algorithm. 

• 3.5 Dynamic programming with a partitioned state space 

The problem of excessive high-speed storage which is attendant to 
the dynamic programming algorithm was attacked with considerable success 
by Larson (1964, 1968) for the case of a deterministic plant and 
continuous time, i.e., 

Larson’s method, called state increment dynamic programming, took 
specific advantage of time being defined on the continuum. This restric- 
tion and the deterministic nature of his plant equation thwart a direct 
application of his technique to the discrete time stochastic problem 
under study. However, a basic concept of Larson's method will be 
employed for the problem at hand. State space will be partitioned into 



blocks, and these blocks will be treated individually in calculating 
the optimal control and cost function. The expected cost function, 
l over each of these blocks will be approximated by a quadratic 
surface. The effect of this partition and the quadratic surfaces is 
to substantially reduce the amount of high-speed memory necessary and 
also to reduce the computation time. The price paid for these advan- 
tages is a more approximate control law than that achieved by 
conventional dynamic programming. However, the classes of systems 
examined will be restricted such that this loss of accuracy is not 


substantial. 

To better illustrate these concepts, consider the second order 
problem and two dimensional state space in Figure 3.2. Here the 
state space has been partitioned into 25 blocks of equal dimension. 
There is no advantage in unequal dimensions ^so for simplicity equal 
dimension blocks are used for the partition. The expected cost 
function is also partitioned into the surfaces above each block. In 
the figure the furface partitions above blocks 0 and 5” are illustrated. 
These surfaces are then to be approximated by a quadratic fit which 
in the two dimensional case will be, for block X 5 




and for the T\ order system 

A 


n 1. 


(3.5) 


■o'(x)= «(<)+ 'X + A %%(*)*•* 

i=\ 1 ( '" 1 T l 

The block size is selected such that, as illustrated in 
Figure 3.3, when is under consideration and control W is applied 






■j'j 


lies in the block containing l ?t or an adjacent block. This 
condition can be met easily enough by making the block size very large. 
However, since the cost function or surface over each block is to be 
approximated by a quadratic surface, it is also desirable to have the 
blocks small in size. Thus, a compromise must be reached, and this 
compromise obviously depends upon the problem being solved. A 
reflective examination of the system equations is usually adequate to 
determine an appropriate block size. 

Consider for example that the state space in Figure 3.2 is ’ 

Xmint- -2<^ end 2-X and that 

- l • Thus, each block would have 100 points in it 
(including its boundaries) with 10 increments to a side. The cost 
surface above each block would be described by 6 numbers, & , 
and ^ . Since for each a member of block JL ( X. g 3 £ ^ J-(x 3 u) 

is restrained to be a member of either ^ or a block adjacent, it is 
possible to evaluate (3.3) for all points in B/ with only the 
parametric description of '3j? and its adjacent blocks in high-speed 
memory. Thus, recalling Figure 3.3, only high-speed memory 

locations are necessary to store the cost surface for the partitioned 
state space algorithm. For conventional dynamic programming jjT} ir2 < S't?0 
high-speed memory locations would be necessary. 

Obviously, even for conventional dynamic programming it would be 
possible to store the entire cost function in low-speed memory (tape, 
disc, or drum storage). However, then it would be necessary to go to 
low-speed memory for each cost function evaluation. This is a time- 
consuming process which would involve K 1 I! N(TTi'^ tc accesses to 

i = f C~i 
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low-speed storage where is the number of controls evaluated at each 

state point. With the example considered in Section 3.3 would 

require \ O “ ICO ' 3 V& O O accesses to low-speed memory. For the 

partitioned state space (PSS) algorithm only accesses would have to be 
made to lew-speed memory, where is the number of blocks (25 accesses 
for the problem in Figure 3.2). In the next section the PSS algorithm is 
shown to reduce computation time as well as high-speed storage. 


3.6 The quadratic approximation of the cost surface 

The criteria for fitting the quadratic surface to the cost function 
over a given block is taken to be unweighted least squares regression. 
For block 6^ recall. 


~~ *1 n J- 

tr« (%■)= <xu)v ZiSiU)-x, I-Z2-, 

> } L~' V‘\ 




j 


(3.5) 


and the functional to be minimized is. 


} 1 1 * 

for the V\~ ZZZiZ. parameters of the quadratic surface. Thus 


1 _ 



and 


te- 'i • • • , ^ 
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C ^ Jl 

-rl jy x i x fc x m^ Z/L‘^Cj Xi'/y . 

**vjA vW L "‘ r * J 


6c) 


V- ij • • * , n * vv\ - L •• > )W 


(3.6) may be summarized in matrix form as, 

S B.CJO = T(t) • ' 


(3.7) 


where 

and 


T 

■tii = ) 




^ C ^") = C U (k) ) TC , U*£k) v 1>>J * ^1 O'W) 5 ac*>) 

♦ -.o' 


•7C6-i5a 


are \xV\ column vectors, and S>is the VAXVA matrix described by (3.6) 
such that (3.7) holds.. Thus, the column vector, "2c , describing the 
Quadratic surface is - 




Z<£)= 5 TU) 

It i-s not necessary to invert a 3 matrix for each block; instead, since 
all blocks have the same dimensions ; $ , T may be calculated for a block 
with standard coordinates, and tT^x) transformed to this block. Thus the 
V VlA matrix S need be inverted only once. Further, the storage for the 
surface for and adjacent blocks is locations. Thus, 


=£> H ^ ^ 

^ 5 - 21 o 
nv 4 — > *k~12l$ ; 


r\~~L 


U r 


To see that the quadratic approximation not only reduces high-speed 
storage requirements but also computation time, recall (3.3) 


lT(tC,VlV^WNA.^ AC^lC* b.7 VJ ( TV X 4 V 



For the noise quantified into values (3.3) became^, (3 .4) , 

U" ~ fru-v. $ A('CvU) -V; B 2l V + 'Sj \ 

U. I ! 

Therefore, it is necessary to evaluate crC^KH^ times for each 

control considered where Ne will have a tendency to increase geometri- 

' 5 

cally with the dimension, f\ . On the other hand, for PSS dynamic 
programming with lying in block Jl and parameters "£,(-0 descrxDing 

XT Ca£|N“i) . for of C- fie , 

^ ( 7<: > M | Jl (Xt a) 4/> b ^ IT ( ft*, a) + ^ \ 3 


or ; approximately, 

' U 


mx'v-i + AeV«)+ Z, AU) (£<*•“'> -20 

U ^ i‘~t 

-V Z.Z| >s ^ 




^ ^ M **« / Es j'. 


i-\ i'\ 


Thus, only one cost function evaluation must be made for each control 

‘ Ji (. 

and the additional term, ^ ^ U) E 

S~\ 3 


S) 
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calculated using known covariances, . The cost function 

evaluation is of \r(-x v V\-\) rather than ; however, the compu- 

tation time of the two evaluations is comparable. The quadratic 
approximation to the cost function 3 therefore, affords a significant 
savings in cost function evaluations and computation time. 

3.7 PSS algorithm 

Once the state space has been partitioned, the PSS dynamic programming 
algorithm can be applied. A flow diagram of the basic procedure is 
contained in Figure 3.4 while a more detailed flow diagram and Fortran 
program listing are to be found in Appendix A. 

A particular block, is designated as the origin block (for example, 
block 1 of Figure 3.2) and the cost surface associated with it is determined 
by techniques to be discussed in Section 3.9. The origin block is 
generally selected to contain the minimum of the cost function over all 
state space if possible. For many problems it is easy to define the origin 
block appropriately, such as the stochastic regulator problem where the 
system is to be driven to the origin of state space.. 

With the cost surface for the origin block obtained, another block, 
say ^ , is considered for processing (Step 2). Borh this block and all 
adjacent calculated blocks are brought into high-speed storage. The block 
being orocessed must have at least one calculated block next to it. this 
is not a significant restriction on the method, as, in general, the blocks 
are ordered in such a manner that they radiate out from the origin block 
as they are considered (Figure 3.2). 




Figure 3.4 Flow diagram for Dynamic Programming with Partitioned 
State Space. 
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The optimal control and cost of each point is calculated 

(Step 4) by 

= w ^ I? \ b- m 0.8) 

where with l&'Cn) know, or is the closest block to the 

point i° r which 2 (yv\) is calculated. The iteration variable hi 

has been suppressed since the blocks will be stored back in the same 
location after they are processed. That is, the stage identity is 
destroyed. The set of costs, ^U-(x)|x e^j? > is then fitted (Step 5) 
with a quadratic surface, ZU) . For the first pass through state space 
^b^ODE- ( ^ , the control for and the parameters of adjacent blocks 

are then placed in low-speed storage (Step 7) .and Step 2 is repeated. 

After all of state space has been considered once, the algorithm goes 
into MODE -2. (Step 8). For all subsequent calculations ^(XjU^ is 
assured of lying in a calculated block for the evaluation of (3.8). Also, 
a comparison of the present cost surface and the previous cost surface over 
the block is made (Step 6) to determine the convergence of the algorithm. 
Convergence is guaranteed for \ by (2.12). The process is continued 
until convergence is attained over all of state space or until a maximum 
number of iterations is reached. 

3.8 Block processing order 

•Before the algorithm described in the last section may be applied, 
the partition of state space must be ordered; i.e., an integer must be 
associated with each block which determines when it will be processed 
during a pass through state space. The only restriction upon this ordering 
is that each block be adjacent to a block previously processed during the 



current processing sequence. - This restriction causes the blocks to tend 
to radiate out through state space from the origin block as they are 
considered. There is, however, reason to be more selective in the 
ordering. Namely, it would be ideal if the optimal control, U , at a 
point y, always caused to lie in a block which had already been 

processed during that pass through state space. This could be accomplished 
if the optimal control were already known. The block ordering could be 
taken opposite to the direction > that is, opposite to the 

direction of the expected transition from ~X. under optimal control. 
Obviously, if the optimal control were known, the problem would be solved; 
however, in many problems although the optimal solution is not known, 
there is some knowledge as to the manner in which the system should be 
controlled . 

This idea was made explicit by Larson with the concept of preferred 
direction of motion . The preferred direction of motion is, basically, the 
expected direction in which the trajectories of the system tend under 
optimal control. The information used in establishing the preferred 
direction is a priori and rests on an intuitive feeling for the system's 
behavior. The blocks are then processed opposite to the preferred 
direction. 

If the preferred direction is not known, the algorithm still works 
and will converge, although more iterations over state space may be 
necessary. Thus a general technique for ordering the blocks in the 
absence of a preferred direction is desired. This objective can be 
achieved in the following way, again suggested by Larson. Let the blocks 
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be designated as in Figure 3.5 where ^> Q is the origin block and is 
defined to have coordinates • The blocks 8 , } • * * ( B,<<, are 

said to lie in layer one ( i ^ 'A '? w in layer two . 

etc. These blocks have coordinates, 

^> v •= Co a) 

(o t -0 

» 

K 

(-v, 0 

i 

i 

i 

6,4= ,Z) . 

The ordering is achieved by counting with 2-digits modulo h'\- 2l_v\ 

for the blocks in layer hi . Take for example layer one; counting 

yields 00, 01, 02, 10, 11, 12, 20, 21, 22. These numbers M0D/3 are 

associated with the block coordinates (0,0), (0,1), (0,-1), (1,0), 

(1.1), (1,-1), (-1,0), (-1,1), (-1,-1), respectively, and the block 

ordering through the first layer is achieved. For the second layer 

counting M0D/5 yields 00, 01, 02, 03, 04, 10, 11, 12, 13, 14, 20, 21, 

23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44. The M0D/5 digits are 

associated with the block coordinate elements as follows: 

O /<■ O 
V ’A0b/5' — > l 
Z.i4^/S 
5 VAO^/^r =~> 2. 

4 v~UA~//y '->-2 














Thus, tlie sequence of MOD / 5 numbers corresponds to the block coordinates 

(0,0), (0,1), (0,-1), (0,2), (0,-2), (1,0), (1,1), (1,-D, (1,2), (1,-2), 

(- 1 , 0 ), (- 1 , 1 ), (- 1 ,- 1 ), (- 1 , 2 ), (- 1 ,- 2 ), ( 2 , 0 ), ( 2 , 1 ), ( 2 ,- 1 ), ( 2 , 2 ), ( 2 ,- 2 ), 

(-2,0), (-2,1), (-2,-1), (-2,2), (-2,-2). Deleting those coordinates in 

layers lower than layer two results in the sequence, (0,2), (0,-2), (1,2), 

(1,-2), (-1,2), (-1,-2), (2,0), (2,1), (2,-1), (2,2), (2,-2), (-1,0), (-2,1), 

(-2,-1), (-2,2), (-2,-2) with the associated blocks . } * 

This counting procedure can be carried out through an arbitrary number of 

k •• it 

layers and for a V\ * order system. The y\“ order system would require 
counting with A- digits A detailed flow diagram and program 

listing for the ordering of blocks is in Appendix C. 


3.9 Calculating the origin block 

To initiate the PSS algorithm it is necessary to calculate the 
quadratic cost surface associated with the origin block for the first pass 
through state space. This can be done either by dynamic programming 
using quadratic approximation over the origin block or by policy iteration 


also employing quadratic approximation. 

Howard’s policy iteration has application in finding the cost function 
of the origin block for the continuous state space stochastic control 
problem. Again, let the quadratic cost surface over the origin block be 
described by 


Vi 


? Q- x 

/d__ Xi. ^ L 

i"l S 


V, l_ 


\j be") = U a- 

C-i y\ 

Then for a fixed policy U C-VJ defined for all grid points in the block. 


it is desired that 



50 


V ~ JL (?c t u 0 ) (?> £ j if C ( '‘q) * ^ 5 


(3.9) 


However, there are ^in general^more than (ft £)/'£_ points in a 

block for a vl order system. Thus, a least square equation error criteria 
is used to determine the quadratic fit for the cost function. That is, 
letting 

( 


a ( a») -v £ h r t r\ i(^ t <Jo) +■ t 
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the functional. 


/ 
{ 




V 


(«0 , 


mxza- 


7= ^4 e* 

-y 

is minimized with respect to °^\ jf • ' ^ /3 VV v x {^, v \ VVA . This minimi: 

tion determines a set of linear equations which in turn define the 
quadratic surface, l)T^(h>Cb, associated with the policy lAtj . This 
surface is then used in step two of Howard's policy iteration to determine 
a new policy £\f . The policy iteration is carried out until conver- 
gence. It has been found numerically that while this procedure works 
well at the origin block (containing the minimum point of the cost 
surface) it does not converge well for other blocks. Thus, it can not be 
used to find the cost surface for blocks other than the origin. 

A second technique to find^ the cost surface of the origin block is 

to employ dynamic programming. Assuming a terminal cost of zero, the 

T> 

dynamic programming algorithm can be applied to each point in Do > i.e., 

C • . . "> 


\Sfyf r. } A \ 

I > l } 
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This cost function is fitted with a quadratic surface o C’t} , then. 


Or OO 






ZrOfC*,uY 


t) \ 


(3.3) 


is calculated for all • Again a quadratic surface is 

fitted to the cost function XjYy^V and (3.3) applied. This procedure is 
carried out until convergence with the speed of convergence described in 
('2.13') . 



EXAMPLES 


1) 


Scalar examples 
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Problems (a) and (b) were solved by both dynamic programming and PSS 
dynamic programming. The state space was partitioned into 


Bo- V 2 ^ ^ - : M 


l P ; <X < -(^ - TR r \ - 

\ > b, - I 2 - ^ , b-x \ 


{, $ ■* 
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P,, =■ { -VL 4* $ io"i o-v.4 %*= V'° 4 ' X 


I 


The percentage difference by the two methods in the final cost 
functions was less than 3%, and the control functions ware identical. 
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2) Second order example 
plant equation 


noise 

discount factor 
state space 


^ \ C. — -<r C in ) -v •>:? ( vt'5 v c , £ 

'O 

Ct+O =• •- u -v 

^ s C V- '} cx , N A ^ C \t) u^A*^Lv-- 4 ? - v ^. 


v - ^ ' 


n 

' S“ S3f,5s“ 


-<T 5' ^ 3 b 

control ~2 p i' U S2,?_ 

^ — 

grid - /x'X , ~ A.pd- r . - <• p 

partition - blocks are square with side 2 units long ..i.e., 
25 grid-points per block 

This problem was solved by both dynamic programming and PSS dynamic 
programming. The percentage difference by the two methods of the 
cost functions was less than 5.4%, while the control functions were 
identical (within the accuracy of the search). The PSS- method took 
approximately 1/5 the computation time of standard dynamic program- 
ming. For the noise the cost functions x?ere within 10% 

of each other, while the accuracy of the control was unaffected. 

3) The discount factor interpreted as a reliability probability 
Let = Prob [the system does not fail in one time period] 

| £=:Prob [the system fails in one time period] , 

and, let 

Jiix.tS) be the stage cost of operating, and 

\ — be the cost of failure. Now the total expected cost is. 
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Thus, the cost function to be minimized for the discount factor, jj , 
interpreted as a reliability probability is the same as before 
except for an additive constant which does not affect the 
minima. Therefore, the PSS algorithm can be applied to problems of 
this nature. In particular, say, to a nuclear rocket control system 
where the control, U , is applied briefly at the start of a control 
period and the rocket is allowed to coast for some time with the 
probability of a system failure being o . The study of a particular 


system of this nature is under way presently. 
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ABSTRACT 


The nonlinear equations describing nuclear reactor behavior 
due to reactivity feedback from variation of temperature and 
moderator density in the core are analyzed. Parameter spaces are 
defined, and stability boundaries for the linearized system are 
determined. Analog computer solutions of the linearized equations 
are presented as verification of the stability of the system. 
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CHAPTER 1 


INTRODUCTION 

A large amount of work has been done in the past on the problem 
of analyzing a nuclear reactor with two reactivity feedback mechanisms 
for stability. In a recent doctoral dissertation,^ Schmidt 
concentrated upon a system with feedback from two different temperature 
regions. In his work, a parameter space is defined, and stability 
boundaries are plotted. In the following paper, a method for determin- 
ing coordinates for such parameter spaces and stability boundaries 

in general is demonstrated, by application to a system presented by 

( 2 ) 

Smith and Stenning. 

Analog computer solutions to the linearized system equations 
are presented in the form of state space plots of the system power vs. 
that system state which gives rise to the prompt reactivity feedback. 
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CHAPTER 2 


OPEN LOOP SYSTEM EQUATIONS 

The differential equations which describe the system to be studied 
are presented in this chapter. These include the point reactor model, 
the prompt jump approximation, and the reactivity feedback model 
corresponding to feedback due to a temperature variation and a gas 
density or pressure variation in the core. 

The equations are normalized and linearized about an equilibrium 
operating point, because the normalized form is more convenient to use 
in the analog computer simulation of the system, and because the 
method of stability analysis to be used is applicable only to the 
linearized form. The feedback loop transfer function is determined, 
and its pole-zero plot is included as an aid in visualizing the system 
dynamics . 

Neutron Kinetics Equations 

The point reactor model is well known , and its derivation will 
not be repeated here. In the case of one delayed group of neutrons, 
the source free equations may be written 



where N is the mean neutron density or power 

T is the mean delayed neutron precursor concentration, 
p is the reactivity of the system. 


2 



I is the neutron generation time, 

3 is the delayed neutron fraction, 

and 


0 = 


m 


*i 


i=l 


A is the mean weighted precursor decay constant, 

1 - — 

X ' o 


L - i A ; 


Since the derivation of this lumped parameter model assumes that 
the delayed neutron precursors remain very close to the spot at which 
they were created, it is not entirely applicable to the case of a 
rocket engine in which the core is made of graphite, and the precursors 
are said to diffuse rapidly and may be swept out with the propellant 
before releasing a neutron. This difficulty may be partially circum- 
vented by using a modified value of g, the delayed neutron fraction. 

The equations are normalized about the equilibrium operating 


values N and F : 
0 0 



The linearization is accomplished by expanding these equations in 
a Taylor series about the equilibrium operating point Xq, neglecting 
higher order terms. 
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In terms of the original system variables, fi X- = 


X - X, 


X, 


If ^ is small compared to |- ^ — N | or XT in equation 2-la, it may 

be neglected. Equation 2-la then yields F = — N.. 

Substituting this value of T.into equation 2-lb and simplifying. 
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t**) 


b! 




This is the nonlinear prompt jump approximation to the one delayed 
group point reactor kinetics equation. Normalizing about the equil- 
ibrium point as vas done previously. 



Linearizing , 




i~L' <Z) 
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' Feedback 'Equations 


The feedback system to be studied represents a proposed nuclear 
rocket engine in which the reactor is used to impart high energy to 
hydrogen propellant, which moves through the reactor, acting as coolant 
and moderator, and is then expelled from the nozzle. The prompt re- 
activity feedback mechanism in this system is the temperature, which 
causes expansion of the graphite core. An increase' in temperature leads 
to an increase in hydrogen moderator pressure, or a decrease in density. 
This is the delayed feedback mechanism. 

The nonlinear equations for temperature-pressure feedback are 
given below: 



^ N 


0 { T T 


lz- 1 *\ 


i'X> 

6o 







hC T 






■V f* 
\ 



where 0 and ^ have replaced the somewhat more complicated coefficients 
or Smith and Stenning. The external reactivity term is necessary to 
insure that p Q = 0, since the temperature and pressure are absolute 
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quantities and are always greater than zero. 


Upon normalization. 
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where c and d are defined as the normalizing constants for the equations 
describing the behavior of the system variables which control the prompt 
and delayed feedback mechanisms, respectively* 
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The useful relation d>2 T o " $ 3^0 discovered as an equilibrium 
condition of equation 2-9b. 

The normalized equations are linearized to 
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These equations are in the standard form x = Ax + bu, y = c. x, 
with <5N T as .the control input u, and the output y = Sp’. 

The feedback loop transfer function is given by 

-? cTT [ V> 
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For real roots, the discriminant in equation 2-13 must be greater 


than or equal to zero, or 


G „ 

T' 


t.0 


ox 



C\ P 
s \ O 



A pole-zero plot for the feedback system for c/d <.20 is shown in 
Figure 2.1. 
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Figure 2.1 Pole-zero plot of -H(s) for Temperature- 

Pressure Feedback 


Unfortunately, the position of the poles and zero is not independent 

of the equilibrium point about which we choose to linearize and normalize 

our equations. Had this position been determined only by 0 and 

we could choose any equilibrium point and derive stability criteria valid 

for every equilibrium point in the linearized system. If the system 

states varied at a reasonable rate, we could assume the system remained 

close to some equilibrium (not necessarily the starting equilibrium) 

* 

at all time, and their position in the parameter space relative to the 
boundaries remained fixed during a short-term perturbation. 
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If the quantities which define the stability curves are functions 
of the particular equilibrium point, however, the effect will be to 
move the stability curves around the parameter space during an excur- 
sion. Whether this results in a larger or smaller region of stability 
than predicted remains to be seen. 
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CHAPTER 3 


CLOSED LOOP SYSTEM 

In this chapter, the neutronic behavior predicted by a) the prompt 
jump approximation, and b) the one delayed group point kinetic model is 
coupled with the feedback equations. A parameter space is defined, and 
the stable and unstable portions of it are determined. The results of 
analog computer solutions of the linearized equations corresponding to 
various points in the parameter space are presented. 
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This is the closed loop system matrix. If its eigenvalues all have 
negative real parts, the system is asymptotically stable. A necessary 
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condition for this is that the coefficients of the system characteristic 
equation not change sign. We will form the characteristic equation and 
compute the Hurwitz determinants. The conditions assuring their positive- 
ness will lead to the stability boundaries in the parameter space. 

The characteristic "equation is 
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The Hurwitz determinants are 
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if we let 
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and set the determinants equal to zero, the curves defined will be the 
boundaries between stable and unstable systems. H3 is the static stability 
line; H2 the resonance line. The space, as it might appear when typical 
operating parameters are substituted for b,c, and d, is shown in Fig. 3 . 1 . 
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The static stability line H3 is in this case situated on the y- 
axis, which means that it is independent of the equilibrium value we 
choose to calculate c and d. Furthermore, the analog computer studies 
detailed in the next section show that the system behavior is quite 
insensitive to the value of y chosen if x is held constant. The effect 
of the "moving" stability boundaries referred to in the last chapter 
should therefore not be too great. 

To determine the region of stability if the prompt jump model is 
replaced by the one delayed group point reactor model, we will repeat 
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the procedure outlined previously; i.e. find the linearized system 
closed loop matrix, determine the characteristic equation, form the 
Iiurwitz determinants and set them equal to zero. The resulting equations 
are the stability boundaries in a parameter space whose coordinates 
are the product of the feedback coefficients and any set of constants 
by which they are consistently multiplied in the equations. 

The system is defined by equations 2-3a, 2~3b, 2-lla, 2-llb, and 
2—12. If the closed loop system matrix 
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is manipulated as before, the stability boundaries are 
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Analog Computer Simulation Results 


The University of Arizona’s Computer Systems, Inc. 5800 analog 
computer was used to simulate the linearized equations to verify the 
stability plane results. For these runs, two specific cases were 

3 

chosen; one corresponds to the parameters used by Wiberg and Woyski, 
for which the feedback loop poles are real, and the other is a fabri- 
cated case in which the feedback loop poles are complex. 

Case A 

A system with No = 2000 MW, T = 2000°K, 0 = .2, cL/eL = .06. 

o 13 ' 

.0745, and = .1 sec ^ would have normalized time constants 
b= .1, c= .2, d=3.33. Since c/d = = .06, condition 2-14 

for real poles is met. 

The parameter space with these values substituted for the inverse 
time constants is shown in Figure 3.2. The values of A^, A^ corres- 
ponding to points A-L on Fig. 3.2 are listed in Table 3.1. 
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Response to a nonequilibrium N T (0) is shown in Fig. 3.3. 

As Xtfas predicted, the system is unstable for x<0 . The behavior at points 
K and L indicates that the stability boundary is where it was predicted 
to be. (Note, however, that this space is valid only near the 
operating level, and we cannot draw any conclusions from it about 
very low power operation.) The system may be driven unstable, but 
must be very large. 

Use of the stability boundary equations determined for one delayed 
group neutronics results in a very slight shifting of the boundaries. 

The prompt jump equation is seen to be a good approximation to the point 
reactor kinetic equations in this case. 

Case B 

If with all the other parameters remaining the same as 

in ‘Case A, c = .710, d = 1.775, and c/d = .4. The poles of the feedback 
loop transfer function are now complex. The new stability boundaries 
are shown in Fig. 3.4. Resonant behavior may be expected as the value 
of y is increased for x>0. .State space plots for 'the linearized- equa- 
tions are shown in Fig. 3.5.. Sustained oscillations are observed in 
the vicinity of point F, and the system is unstable at point G. 
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Chapter 4 


CONCLUSION 


The nonlinear equations describing the behavior of a proposed 
nuclear rocket engine were linearized and analyzed. Regions, of linear 
stability and instability in a parameter space were delineated, and a 
general method for the determination of the coordinates of such parameter 
spaces was outlined. 

Two particular systems were chosen for further investigation. They 
corresponded to feedback system transfer functions with and without complex 
poles. The system with complex poles exhibited oscillatory behavior for 
certain values of feedback coefficients, as was predicted. 

Much work remains to be done on the problem of determination of the 
regions of stability for this system. A digital computer code has been 
written to help in the plotting of the stability boundaries . With its 
help, an ■ investigation of the effect of changing only the equilibrium 
power on the stability boundaries may be carried out. 

An attempt has been made to simulate the nonlinear equations on the 
analog computer, but due to the complexity of the problem and inherent 
inaccuracy of the nonlinear computing devices, the results were deemed 
unreliable. However, arrangements have been made with the Electrical 
Engineering Department for use of their PDP-9 digital computer. The 
problem will be coded in DARE, a new digital simulation language. It 
is hoped that with this tool, the nonlinear systems equations may be 
solved and used to verify the predicted system stability. 
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